LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 1508 - typeinfos and personality functions should be attached to the invoke
Summary: typeinfos and personality functions should be attached to the invoke
Status: RESOLVED FIXED
Alias: None
Product: new-bugs
Classification: Unclassified
Component: new bugs (show other bugs)
Version: unspecified
Hardware: Other Linux
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks: 1914
  Show dependency tree
 
Reported: 2007-06-13 11:47 PDT by Duncan Sands
Modified: 2013-11-19 16:16 PST (History)
5 users (show)

See Also:
Fixed By Commit(s):


Attachments
failing test (4.89 KB, text/plain)
2007-11-09 19:17 PST, Dale Johannesen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan Sands 2007-06-13 11:47:54 PDT
Exception handling typeinfos and personality functions are currently
supplied by an intrinsic (eh.selector or eh.filter) that is placed in
the invoke landing pad.  This is a flawed scheme: the optimizers can move
the intrinsic out of the landing pad.  How then to know to which landing
pad the typeinfos and personality should be applied?  This is exactly what
happens to the intrinsics in landing pad unwind871 in
test/CodeGen/Generic/2007-06-06-CriticalEdgeLandingPad.ll
[The optimizers insert a critical edge basic block; this becomes
the landing pad, with the intrinsics in the successor.  You can tell
that the type infos got lost because the exception action for the
invoke in cond_true870 is 1, i.e. cleanups only, rather than 5].
This PR is for discussion of how best to fix this issue.
Comment 1 Chris Lattner 2007-06-13 11:52:26 PDT
I agree, but I don't know the right solution for this :)
Comment 2 Duncan Sands 2007-06-14 23:22:43 PDT
Some thoughts on exception handling intrinsics.

(1) eh.typeid.for
  This has two problems that I see:
  (a) the typeid depends on the personality function: if we have multiple
  personalities in one function, each one is going to have its own list of
  typeinfos (Ada typeinfos for the Ada personality, C++ typeinfos for the
  C++ personality etc).  The unwinder gives us the typeinfo index into the
  typeinfo list for the personality function.
  (b) the typeid depends on the function you are in.  Maybe not a big deal,
  but it's kind of weird that you can call eh.typeid.for with the same argument
  in different functions and get different results.

  Solutions:
  (x) The radical solution: get rid of eh.typeid.for by modifying
  eh.selector to return the typeinfo itself rather than the index of the
  typeinfo.  This has the advantage of reducing the number of concepts
  that eh users need to know about, and turns the notion of "typeid"
  into a codegen detail.  It also would remove the dependence on the
  personality function.  At codegen time typeids would need to be
  reintroduced.  Code like this (pseudo code!):
   %gv = eh.selector(...)
   if (%gv == typeinfo1)
     do_something;
   else if (%gv == typeinfo2)
     do_something else;
  would need to be expanded into:
   %id = eh.old_style_selector(...);
   switch (%id) {
     case 1:
       %gv = typeinfo1;
     case 2:
       %gv = typeinfo2;
  ...
   }
   if (%gv == typeinfo1)
     do_something;
   else if (%gv == typeinfo2)
     do_something else;

  Hopefully this expanded code would be efficiently simplified by running a few
  optimizer passes before codegen really begins.  I don't know if this kind of
  approach fits the way codegen is organized.

  (y) Add the personality function as an additional eh.typeid.for argument.
  This would solve (a) but not (b).

(2) eh.selector
  This intrinsic has a sensible meaning: check if the exception belongs to
  one of a list of typeinfos, and return which one it matches.  However the
  current implementation means it only works if placed in a landing pad, and
  even if originally placed in one it can be moved around by the optimizers.
  This makes the intrinsic fragile - I expect it to cause more trouble once
  more than just llvm-gcc starts generating it.
  So let me make a radical suggestion which would result in eh.selector working
  no matter where it is placed, even if it is in another function!  [This has
  some runtime cost, however some simple analysis can eliminate the cost if
  eh.selector is in a landing pad or not far from one - more on this later].
  Here is the idea: at codegen time expand
   %id = eh.selector(exception, personality, typeinfo1, ...)
  into
    invoke @_Unwind_Resume(exception) unwind %classify <= place personality and 
typeinfo list in exception table for this invoke
    unreachable
    classify:
    %id = selector_register_value
  i.e. rethrow the exception simply for the purpose of classifying it.
  Since we synthesize the invoke at codegen time there is no problem
  attaching the personality and typeinfos to it.
  Now the obvious objection is that this is costly.  So here's an optimization:
  if the eh.selector is in the landing pad for an invoke, add the personality
  and typeinfos to that invoke, and just inspect the selector_register_value,
  i.e. do what we do right now.  More generally, if we can work out which
  invoke(s) the exception parameter came from, then we can avoid the rethrow
  and add the typeinfos to those invokes.
  Thus eh.selector would work anywhere, and the current implementation can be
  thought of as an optimization.  The hacky workaround
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070611/050446.html
  that I suggested for this PR can then be seen as a poor man's version of the
  suggested optimization method: trying to back assign eh.selector to a set of
  invokes.
Comment 3 Duncan Sands 2007-06-17 15:05:11 PDT
Additional problems with the current eh.selector lowering:
(1) if two eh.selector intrinsics are used in the same landing
pad, the result may well be wrong.  What happens now is that the
typeinfos are concatenated (in reverse order).  Suppose the second
eh.selector has, for example, a catch-all typeinfo argument, while
the first has a specific typeinfo, say T.  If an instance of T is
thrown then the first selector will *not* return the typeid
of T.  That is because the catch-all typeinfo of the other
selector will be encountered first by the unwind code, so
that is what will always be returned.  This would be fixed
by my suggestion for eh.selector lowering in the previous
comment.
(2) The first argument to eh.selector is a pointer to an exception,
but this argument is ignored.  The selector is always taken to be for
the current landing pad i.e. the current exception, even if some
previous exception is actually passed as the argument.
Comment 4 Duncan Sands 2007-06-26 09:51:21 PDT
> (1) if two eh.selector intrinsics are used in the same landing
> pad, the result may well be wrong.

Likewise for two filters or mixes of filters and selectors.  This
is going to happen, because catches can be nested inside filters
in C++, and the only way we have to represent this for the moment
is one or more eh.filter followed by one or more eh.selector intrinsics.
[All of the catche and filter info needs to be attached to each
invoke inside the inner scope].  Amusingly enough, llvm-convert
doesn't handle even the nested catch case right now - only the
inner catches are taken into account.
Comment 5 Dale Johannesen 2007-11-09 19:17:54 PST
Created attachment 1210 [details]
failing test
Comment 6 Dale Johannesen 2007-11-09 19:19:27 PST
The test above fails with llvm-as < test | opt -std-compile-opts | llc -enable-eh
Seems to be the same basic problem, the loop optimizers have split a critical edge into a landing pad.

Comment 7 Dale Johannesen 2007-11-09 19:46:22 PST
Split is done in LoopSimplify::SplitBlockPredecessors which does not know about landing pads, and needs to.

Comment 8 Duncan Sands 2007-11-13 02:24:19 PST
> Split is done in LoopSimplify::SplitBlockPredecessors which does not know
> about landing pads, and needs to.

It sounds like this problem needs to get a bug report of its own,
since (if I understand right) it is a bug in loop splitting, and
not really due to the design of exception handling (this bug).
Comment 9 Dale Johannesen 2007-11-13 11:35:02 PST
I don't think so; the loop splitting is an issue only because of the underlying problem:
"Exception handling typeinfos and personality functions are currently
supplied by an intrinsic (eh.selector or eh.filter) that is placed in
the invoke landing pad.  This is a flawed scheme: the optimizers can move
the intrinsic out of the landing pad."
It is true that the short-term fix I have in mind is to duplicate the landing pad somehow (the IR->machine level conversion looks more promising than the loop splitting at this point) rather than fixing the underlying design problem.
Comment 10 Duncan Sands 2007-11-14 15:39:08 PST
If I understand right, whats happening here is that before the Bar
call there is a store to %tmp31.  This store has been moved after
the call to Bar, into both the unwind and normal code paths.  This
results in

        invoke void @Bar( i64 %tmp22, %struct.Range* %effectiveRange )
                        to label %invcont23 unwind label %unwind.loopexit
...
unwind.loopexit:                ; preds = %cond_true
        store i64 %tmp31.tmp.0, i64* %tmp31, align 4
        br label %unwind

unwind:         ; preds = %unwind.loopexit, %entry
        %eh_ptr = call i8* @llvm.eh.exception( )

in which the calls to the exception handling intrinsics have been
pushed by the store to a later basic block.  Tricky :)  I will
think about it.
Comment 11 Dale Johannesen 2007-11-14 15:53:47 PST
That's not exactly how this arose.  Originally there was a single landing pad, unwind, which could be reached from either Foo or Bar (the latter inside a loop).  The loop optimizer decided to split the 
(abnormal) Bar->unwind edge to put some code on it; the new block unwind.loopexit becomes the new landing pad for Bar, since it's reachable only by throwing.  However unwind.loopexit does not have an eh.selector in it.  Is it sufficient for the loop code that does this (see comment 7) to add one, do you think?

Comment 12 Duncan Sands 2007-11-14 16:14:42 PST
How about simply removing the assertion (SelectionDAGISel.cpp:4495)?
This routine, copyCatchInfo, is taking the selector call in unwind,
and applying the typeinfos in it to unwind.loopexit.  This is exactly
what we want in this case!  I'm pretty sure I only added the assertion
because it seemed like a funky case that might cause problems elsewhere.
For example, the selector register will be marked live-in on both
unwind and unwind.loopexit.  Will the register value from unwind.loopexit
survive to be used in unwind, or will it get borked somehow because it is
also marked live-in on unwind?  Since I had no testcase, and I couldn't
see how this could happen in practice (hah!) it seemed best to just abort.
I don't have time tonight to check that everything is fine if the assertion
is removed, maybe tomorrow.
Comment 13 Dale Johannesen 2007-11-14 16:27:56 PST
I tried that and (after turning off various other checking that it broke) the EH tables did not look right, although I don't grok them well enough to be sure.  The trouble is that the original 'unwind' is still a valid landing pad, reached from Foo.
Comment 14 Duncan Sands 2007-11-15 03:59:20 PST
The tables look fine to me, so I've committed this.  It's not clear to
me if the exception and exception selector registers are being handled
properly because I don't understand ppc assembler.
Comment 15 Dale Johannesen 2007-11-15 12:31:53 PST
Looks good.  I thought there had to be a 1-1 correspondence between selectors and landing pads, I think, but there doesn't.  Thanks for fixing this.
Comment 16 Duncan Sands 2007-11-16 02:57:18 PST
If you are happy, can you please check in the testcase.
Comment 17 Duncan Sands 2008-03-15 12:24:34 PDT
Some additional thoughts on this: it may be possible to eliminate the
list of typeinfos from eh.selector altogether.  Currently eh.selector
is used in three different way (that should probably be separated):

(1) as a way of getting hold of the value in the selector register
(set by the runtime before it branches to a landing pad)
(2) as a way of specifying the personality function
(3) as a way of giving a list of typeids the exception should be
compared against.  This includes the "catch-all" typeid (language
dependent) without which we don't know how to codegen invoke properly.
There may also be filter expressions.

As far as (1) is concerned there is no real problem if the selector
call occurs far away from the landing pad, because we could do the
following implementation: at codegen time, create a new variable
to hold the selector value, copy the register into it at the start
of each landing pad, and have eh.selector just read the variable
and return its values.  Now that codegen has dominator info, it
shouldn't be hard to introduce a new variable like this at codegen
time.  The same can be done for eh.exception.

As for (2), since currently we only support one personality per
function, it doesn't matter where in the function it is; the same
goes for specifying the "catch-all" typeid: we could have a special
intrinsic for saying what the personality and catch-all are, and
it would just need to be placed somewhere in the function.  I don't
want to discuss this further here.

So now for (3).  I'm not going to discuss filters here - I plan
to create a special PR for discussion of how filters could be
eliminated altogether from LLVM exception handling.

What is the list of typeids in the selector call for?
The eh handler code generated by llvm-gcc looks like this:

compare the exception with typeinfo T1
if it matches, branch to B1
otherwise, compare the exception with typeinfo T2
if it matches, branch to B2
...
otherwise, compare the exception with typeinfo TN
if it matches, branch to BN

The list of eh.selector typeinfos is exactly B1, ..., BN.
The point I want to make is that we could auto-generate
this list by analysing the code.

Why does eh.selector want to assemble this list at all?
After all, it is possible to query the runtime to find
out whether the exception matches a particular typeinfo,
so this query could simply be performed N times for each
of B1, ..., BN.  (I explain below how this querying can
be done).  The reason is that two queries is much more
expensive than one, so it is cheaper to perforrm batch queries,
if possible only performing one.  In fact the current
scheme only allows for one (batch) query, the query being
specified by the list of typeinfos in eh.selector.

Thus we could do the following scheme: remove the list
of typeinfos from eh.  In fact remove eh.selector altogether
and instead have an exception comparison intrinsic:
llvm.eh.compare(exception_pointer, typeinfo).  [This also
means we can remove eh.typeid].  At codegen time, starting
from each landing pad, follow the cfg looking for eh.compare
calls (don't follow exception edges, since they give a new
exception).  If it can be determined that they occur in a
certain order, generate the corresponding list of typeinfos
and perform one query with that [this should always be the
case when the code was generated by llvm-gcc], otherwise
there will need to be multiple queries.  Now I come to think
about it, maybe the llvm.eh.compare call should not take
the exception_pointer as an argument, and just use the
current exception.  Otherwise you may have some hassles
if you lose track of the exception pointer (though it's
not fatal since you can always just generate more queries).

So how do you query the runtime to find out whether the
exception belongs to a given list of typeinfos?  Easy:
you generate an invoke of a reraise statement, attaching
the list of typeinfos to it, and arrange for it to unwind
to the next instruction.  The selector register holds the
result of the query.  As an optimization, when downstream
from an identifiable landing pad, the typeinfo list can be
attached to the original invoke [the common case, which
corresponds to what we try to do now].
Comment 18 Duncan Sands 2008-03-16 07:58:42 PDT
> I plan to create a special PR for discussion of how filters could be
> eliminated altogether from LLVM exception handling.

PR2157.
Comment 19 Duncan Sands 2008-03-16 08:01:03 PDT
Note that the llvm.eh.compare intrinsic would be readonly (and even
readnone in the typeinfo parameter) making it easy for standard
optimizations (correlated expressions...) to eliminate pointless
comparisons, giving automatic exception handling simplification.

Also note that the required analysis at the codegen level is
analogous to the problem of synthesizing switch statements out
of a series of comparisons and conditional branches.
Comment 20 Bill Wendling 2012-01-30 19:35:53 PST
Duncan, This is done, no? :-)
Comment 21 Duncan Sands 2012-01-31 03:04:07 PST
Bill fixed this.