Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typeinfos and personality functions should be attached to the invoke #1880

Closed
llvmbot opened this issue Jun 13, 2007 · 23 comments
Closed

typeinfos and personality functions should be attached to the invoke #1880

llvmbot opened this issue Jun 13, 2007 · 23 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 13, 2007

Bugzilla Link 1508
Resolution FIXED
Resolved on Nov 19, 2013 16:16
Version unspecified
OS Linux
Blocks llvm/llvm-bugzilla-archive#1914
Reporter LLVM Bugzilla Contributor
CC @asl

Extended Description

Exception handling typeinfos and personality functions are currently
supplied by an intrinsic (eh.selector or eh.filter) that is placed in
the invoke landing pad. This is a flawed scheme: the optimizers can move
the intrinsic out of the landing pad. How then to know to which landing
pad the typeinfos and personality should be applied? This is exactly what
happens to the intrinsics in landing pad unwind871 in
test/CodeGen/Generic/2007-06-06-CriticalEdgeLandingPad.ll
[The optimizers insert a critical edge basic block; this becomes
the landing pad, with the intrinsics in the successor. You can tell
that the type infos got lost because the exception action for the
invoke in cond_true870 is 1, i.e. cleanups only, rather than 5].
This PR is for discussion of how best to fix this issue.

@lattner
Copy link
Collaborator

lattner commented Jun 13, 2007

I agree, but I don't know the right solution for this :)

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 15, 2007

Some thoughts on exception handling intrinsics.

(1) eh.typeid.for
This has two problems that I see:
(a) the typeid depends on the personality function: if we have multiple
personalities in one function, each one is going to have its own list of
typeinfos (Ada typeinfos for the Ada personality, C++ typeinfos for the
C++ personality etc). The unwinder gives us the typeinfo index into the
typeinfo list for the personality function.
(b) the typeid depends on the function you are in. Maybe not a big deal,
but it's kind of weird that you can call eh.typeid.for with the same argument
in different functions and get different results.

Solutions:
(x) The radical solution: get rid of eh.typeid.for by modifying
eh.selector to return the typeinfo itself rather than the index of the
typeinfo. This has the advantage of reducing the number of concepts
that eh users need to know about, and turns the notion of "typeid"
into a codegen detail. It also would remove the dependence on the
personality function. At codegen time typeids would need to be
reintroduced. Code like this (pseudo code!):
%gv = eh.selector(...)
if (%gv == typeinfo1)
do_something;
else if (%gv == typeinfo2)
do_something else;
would need to be expanded into:
%id = eh.old_style_selector(...);
switch (%id) {
case 1:
%gv = typeinfo1;
case 2:
%gv = typeinfo2;
...
}
if (%gv == typeinfo1)
do_something;
else if (%gv == typeinfo2)
do_something else;

Hopefully this expanded code would be efficiently simplified by running a few
optimizer passes before codegen really begins. I don't know if this kind of
approach fits the way codegen is organized.

(y) Add the personality function as an additional eh.typeid.for argument.
This would solve (a) but not (b).

(2) eh.selector
This intrinsic has a sensible meaning: check if the exception belongs to
one of a list of typeinfos, and return which one it matches. However the
current implementation means it only works if placed in a landing pad, and
even if originally placed in one it can be moved around by the optimizers.
This makes the intrinsic fragile - I expect it to cause more trouble once
more than just llvm-gcc starts generating it.
So let me make a radical suggestion which would result in eh.selector working
no matter where it is placed, even if it is in another function! [This has
some runtime cost, however some simple analysis can eliminate the cost if
eh.selector is in a landing pad or not far from one - more on this later].
Here is the idea: at codegen time expand
%id = eh.selector(exception, personality, typeinfo1, ...)
into
invoke @​_Unwind_Resume(exception) unwind %classify <= place personality and
typeinfo list in exception table for this invoke
unreachable
classify:
%id = selector_register_value
i.e. rethrow the exception simply for the purpose of classifying it.
Since we synthesize the invoke at codegen time there is no problem
attaching the personality and typeinfos to it.
Now the obvious objection is that this is costly. So here's an optimization:
if the eh.selector is in the landing pad for an invoke, add the personality
and typeinfos to that invoke, and just inspect the selector_register_value,
i.e. do what we do right now. More generally, if we can work out which
invoke(s) the exception parameter came from, then we can avoid the rethrow
and add the typeinfos to those invokes.
Thus eh.selector would work anywhere, and the current implementation can be
thought of as an optimization. The hacky workaround
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070611/050446.html
that I suggested for this PR can then be seen as a poor man's version of the
suggested optimization method: trying to back assign eh.selector to a set of
invokes.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 17, 2007

Additional problems with the current eh.selector lowering:
(1) if two eh.selector intrinsics are used in the same landing
pad, the result may well be wrong. What happens now is that the
typeinfos are concatenated (in reverse order). Suppose the second
eh.selector has, for example, a catch-all typeinfo argument, while
the first has a specific typeinfo, say T. If an instance of T is
thrown then the first selector will not return the typeid
of T. That is because the catch-all typeinfo of the other
selector will be encountered first by the unwind code, so
that is what will always be returned. This would be fixed
by my suggestion for eh.selector lowering in the previous
comment.
(2) The first argument to eh.selector is a pointer to an exception,
but this argument is ignored. The selector is always taken to be for
the current landing pad i.e. the current exception, even if some
previous exception is actually passed as the argument.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 26, 2007

(1) if two eh.selector intrinsics are used in the same landing
pad, the result may well be wrong.

Likewise for two filters or mixes of filters and selectors. This
is going to happen, because catches can be nested inside filters
in C++, and the only way we have to represent this for the moment
is one or more eh.filter followed by one or more eh.selector intrinsics.
[All of the catche and filter info needs to be attached to each
invoke inside the inner scope]. Amusingly enough, llvm-convert
doesn't handle even the nested catch case right now - only the
inner catches are taken into account.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 10, 2007

failing test

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 10, 2007

The test above fails with llvm-as < test | opt -std-compile-opts | llc -enable-eh
Seems to be the same basic problem, the loop optimizers have split a critical edge into a landing pad.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 10, 2007

Split is done in LoopSimplify::SplitBlockPredecessors which does not know about landing pads, and needs to.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 13, 2007

Split is done in LoopSimplify::SplitBlockPredecessors which does not know
about landing pads, and needs to.

It sounds like this problem needs to get a bug report of its own,
since (if I understand right) it is a bug in loop splitting, and
not really due to the design of exception handling (this bug).

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 13, 2007

I don't think so; the loop splitting is an issue only because of the underlying problem:
"Exception handling typeinfos and personality functions are currently
supplied by an intrinsic (eh.selector or eh.filter) that is placed in
the invoke landing pad. This is a flawed scheme: the optimizers can move
the intrinsic out of the landing pad."
It is true that the short-term fix I have in mind is to duplicate the landing pad somehow (the IR->machine level conversion looks more promising than the loop splitting at this point) rather than fixing the underlying design problem.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 14, 2007

If I understand right, whats happening here is that before the Bar
call there is a store to %tmp31. This store has been moved after
the call to Bar, into both the unwind and normal code paths. This
results in

    invoke void @&#8203;Bar( i64 %tmp22, %struct.Range* %effectiveRange )
                    to label %invcont23 unwind label %unwind.loopexit

...
unwind.loopexit: ; preds = %cond_true
store i64 %tmp31.tmp.0, i64* %tmp31, align 4
br label %unwind

unwind: ; preds = %unwind.loopexit, %entry
%eh_ptr = call i8* @​llvm.eh.exception( )

in which the calls to the exception handling intrinsics have been
pushed by the store to a later basic block. Tricky :) I will
think about it.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 14, 2007

That's not exactly how this arose. Originally there was a single landing pad, unwind, which could be reached from either Foo or Bar (the latter inside a loop). The loop optimizer decided to split the
(abnormal) Bar->unwind edge to put some code on it; the new block unwind.loopexit becomes the new landing pad for Bar, since it's reachable only by throwing. However unwind.loopexit does not have an eh.selector in it. Is it sufficient for the loop code that does this (see comment 7) to add one, do you think?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 15, 2007

How about simply removing the assertion (SelectionDAGISel.cpp:4495)?
This routine, copyCatchInfo, is taking the selector call in unwind,
and applying the typeinfos in it to unwind.loopexit. This is exactly
what we want in this case! I'm pretty sure I only added the assertion
because it seemed like a funky case that might cause problems elsewhere.
For example, the selector register will be marked live-in on both
unwind and unwind.loopexit. Will the register value from unwind.loopexit
survive to be used in unwind, or will it get borked somehow because it is
also marked live-in on unwind? Since I had no testcase, and I couldn't
see how this could happen in practice (hah!) it seemed best to just abort.
I don't have time tonight to check that everything is fine if the assertion
is removed, maybe tomorrow.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 15, 2007

I tried that and (after turning off various other checking that it broke) the EH tables did not look right, although I don't grok them well enough to be sure. The trouble is that the original 'unwind' is still a valid landing pad, reached from Foo.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 15, 2007

The tables look fine to me, so I've committed this. It's not clear to
me if the exception and exception selector registers are being handled
properly because I don't understand ppc assembler.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 15, 2007

Looks good. I thought there had to be a 1-1 correspondence between selectors and landing pads, I think, but there doesn't. Thanks for fixing this.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 16, 2007

If you are happy, can you please check in the testcase.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 15, 2008

Some additional thoughts on this: it may be possible to eliminate the
list of typeinfos from eh.selector altogether. Currently eh.selector
is used in three different way (that should probably be separated):

(1) as a way of getting hold of the value in the selector register
(set by the runtime before it branches to a landing pad)
(2) as a way of specifying the personality function
(3) as a way of giving a list of typeids the exception should be
compared against. This includes the "catch-all" typeid (language
dependent) without which we don't know how to codegen invoke properly.
There may also be filter expressions.

As far as (1) is concerned there is no real problem if the selector
call occurs far away from the landing pad, because we could do the
following implementation: at codegen time, create a new variable
to hold the selector value, copy the register into it at the start
of each landing pad, and have eh.selector just read the variable
and return its values. Now that codegen has dominator info, it
shouldn't be hard to introduce a new variable like this at codegen
time. The same can be done for eh.exception.

As for (2), since currently we only support one personality per
function, it doesn't matter where in the function it is; the same
goes for specifying the "catch-all" typeid: we could have a special
intrinsic for saying what the personality and catch-all are, and
it would just need to be placed somewhere in the function. I don't
want to discuss this further here.

So now for (3). I'm not going to discuss filters here - I plan
to create a special PR for discussion of how filters could be
eliminated altogether from LLVM exception handling.

What is the list of typeids in the selector call for?
The eh handler code generated by llvm-gcc looks like this:

compare the exception with typeinfo T1
if it matches, branch to B1
otherwise, compare the exception with typeinfo T2
if it matches, branch to B2
...
otherwise, compare the exception with typeinfo TN
if it matches, branch to BN

The list of eh.selector typeinfos is exactly B1, ..., BN.
The point I want to make is that we could auto-generate
this list by analysing the code.

Why does eh.selector want to assemble this list at all?
After all, it is possible to query the runtime to find
out whether the exception matches a particular typeinfo,
so this query could simply be performed N times for each
of B1, ..., BN. (I explain below how this querying can
be done). The reason is that two queries is much more
expensive than one, so it is cheaper to perforrm batch queries,
if possible only performing one. In fact the current
scheme only allows for one (batch) query, the query being
specified by the list of typeinfos in eh.selector.

Thus we could do the following scheme: remove the list
of typeinfos from eh. In fact remove eh.selector altogether
and instead have an exception comparison intrinsic:
llvm.eh.compare(exception_pointer, typeinfo). [This also
means we can remove eh.typeid]. At codegen time, starting
from each landing pad, follow the cfg looking for eh.compare
calls (don't follow exception edges, since they give a new
exception). If it can be determined that they occur in a
certain order, generate the corresponding list of typeinfos
and perform one query with that [this should always be the
case when the code was generated by llvm-gcc], otherwise
there will need to be multiple queries. Now I come to think
about it, maybe the llvm.eh.compare call should not take
the exception_pointer as an argument, and just use the
current exception. Otherwise you may have some hassles
if you lose track of the exception pointer (though it's
not fatal since you can always just generate more queries).

So how do you query the runtime to find out whether the
exception belongs to a given list of typeinfos? Easy:
you generate an invoke of a reraise statement, attaching
the list of typeinfos to it, and arrange for it to unwind
to the next instruction. The selector register holds the
result of the query. As an optimization, when downstream
from an identifiable landing pad, the typeinfo list can be
attached to the original invoke [the common case, which
corresponds to what we try to do now].

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 16, 2008

I plan to create a special PR for discussion of how filters could be
eliminated altogether from LLVM exception handling.

llvm/llvm-bugzilla-archive#2157 .

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 16, 2008

Note that the llvm.eh.compare intrinsic would be readonly (and even
readnone in the typeinfo parameter) making it easy for standard
optimizations (correlated expressions...) to eliminate pointless
comparisons, giving automatic exception handling simplification.

Also note that the required analysis at the codegen level is
analogous to the problem of synthesizing switch statements out
of a series of comparisons and conditional branches.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 31, 2012

Duncan, This is done, no? :-)

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 31, 2012

Bill fixed this.

@ggreif
Copy link
Contributor

ggreif commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#1682

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#1914

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants