Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for asynchronous "non-call" exceptions, eliminate invoke/call dichotomy #1641

Open
llvmbot opened this issue Mar 24, 2007 · 43 comments
Labels
bugzilla Issues migrated from bugzilla enhancement Improving things as opposed to bug fixing, e.g. new or missing feature llvm:core

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 24, 2007

Bugzilla Link 1269
Version 1.0
OS All
Reporter LLVM Bugzilla Contributor
CC @andykaylor,@asl,@chandlerc,@majnemer,@hfinkel,@jwillemsen,@markus-oberhumer,@nlewycky,@pogo59,@pcc,@rnk,@kaomoneus,@yuanfang-chen

Extended Description

I would like to see Chris' notes on Exception Handling implemented (see the link
above), preferably for 2.0. My reasons for this are two fold:

  1. It fits in the same category of disruptive assembly/bytecode changes as the
    other major changes in LLVM 2.0; so, including it in 2.0 means less future
    impact for 2.0 users.
  2. As I start to develop the HLVM exception model, I'd prefer to do it on an EH
    model in LLVM that is simpler for the kinds of languages HLVM targets.

Yes, I'm signing up to do this; hopefully, before 2.0 is released.

@lattner
Copy link
Collaborator

lattner commented Mar 24, 2007

Very cool.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 25, 2007

Some increments to map this work out (feedback would be welcome):

  1. Add an "unwindTo" member to BasicBlock.
  2. Add "unwind to" keywords to block label in assembly parser
  3. Add "unwind to" keywords to block label in assembly writer
  4. Add support for "unwind to" to bytecode reader
  5. Add support for "unwind to" to bytecode writer.
  6. Fix prune-eh to deal with "unwindTo" in BasicBlock.
  7. Remove generation of invoke instructions in llvm-gcc
  8. Remove invoke instruction from assembly parser
  9. Remove invoke instruction from assembly writer
  10. Remove invoke instruction from bytecode reader
  11. Remove invoke instruction from bytecode writer
  12. Replace all uses of invoke instruction in LLVM passes with Call instruction
  13. Remove all uses of CallSite class
  14. Remove CallSite class.
  15. Remove invoke instruction from LLVM IR

Anything I forgot, or anything that would lead to an inconsistent state?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 26, 2007

Unfortunately, I won't get time to work on this for 2.0

@lattner
Copy link
Collaborator

lattner commented May 23, 2007

Here are the notes in question: http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt

@asl
Copy link
Collaborator

asl commented May 24, 2007

Btw, this should simplify debug information emission a lot, since DWARF debug
information operates on "ranges".

@llvmbot
Copy link
Collaborator Author

llvmbot commented Sep 7, 2007

Looks like I'm not going to get to this any time soon.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 13, 2007

Patch for steps 1. to 5.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 13, 2007

I would like to sign on this. I'm not entirely familiar with everything so currently I believe that 1. to 15. are the right steps to accomplish it.

Here's a first patch that addresses 1. to 5. I wait for your comments. If everything is fine, I'll assign myself to the bug (do I have the rights to do that?)

Before addressing 6., we must think on how we represent control flow when a basic block unwinds to another basic block. The issue is that a BasicBlock is currently not a User object, so it can not be added to the use objects of another basic block.

I see three windows here on how to implement this:

  1. Following Chris notes, we could add a bit to each instruction to say if it can raise an exception. In that case, the instructions that raise an exception of a basic block are all "uses" of the basic block to which they may unwind to (but this may change all these instructions to terminators instructions in LLVM design)

  2. We add a new field to the BasicBlock class, which is an array of BasicBlocks, listing all predecessors (I think a BasicBlock only needs to know the basic blocks that are predecessors, not the specific instructions that may jump to the basic block. I may be wrong here)

  3. Make the BasicBlock class inherits the User class (but does it make sense?)

I've been dying on having this feature implemented. I think once we solve this implementation of uses issue, steps 6. to 15. shouldn't be hard to implement.

And by the way, I must say, it's a real pleasure to hack LLVM. You've done a really great job for having so, so, so readable code (I'm talking about the code that I changed for 1. to 5. OK, it's just parsing, but well, I've found so unreadable code on other projects....)

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 14, 2007

Patch for steps 1. to 5., 7. and 8. to 11.
This patch corrects the previous patch (a bug in AsmWriter) and implements 8. to 11. It also contains the patch for step 7. I think once we solve the issue on my first comment, we can plan on removing the InvokeInstruction visible to users (since Invoke instructions are spread all over LLVM, it may take a long time before removing them all).

Once I get your feedback on the patches, I'll apply steps 1. to 5., and then mail the mailing list, informing of the changes. I hope it does not imply many changes in the codegen.

For step 7, since in gcc call sites only can throw exceptions, the changes were not difficult. However, my machine can not compile llvm-gcc, can someone look if my patch compiles fine?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 15, 2007

One issue not addressed in steps 1 .. 15 is the problem of
instruction re-ordering. For example, consider

...
%tmp = some_useful_value
invoke some_routine unwind %some_unwind_block
...
some_unwind_block:
...use %tmp...

It is essential that codegen does not allow the assignment
to %tmp to be re-ordered after the invoke. This is carefully
taken care of in SelectionDAGISel. The difficulty is that since
%tmp is a register, and isn't used as an argument to the invoke
call, a priori there's no reason why it shouldn't be re-ordered
after the call. This is why special logic is required, which
basically says that an invoke implies a compiler barrier. How
to handle this in the new setup? Must all potentially throwing
instructions imply compiler barriers?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 15, 2007

  1. Following Chris notes, we could add a bit to each instruction to say if
    it can raise an exception. In that case, the instructions that raise an
    exception of a basic block are all "uses" of the basic block to which
    they may unwind to (but this may change all these instructions to
    terminators instructions in LLVM design)

I think they have to become terminators: if you allow control flow
to branch in the middle of basic blocks then you have discarded the
whole notion of "basic block" which seems like a very dangerous thing
to do.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 15, 2007

For step 7, since in gcc call sites only can throw exceptions, the changes
were not difficult.

This is completely wrong: the whole reason for this PR is that in gcc non
call sites can throw exceptions! See tree_could_throw_p in tree-eh.c.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 16, 2007

For step 7, since in gcc call sites only can throw exceptions, the changes
were not difficult.

This is completely wrong: the whole reason for this PR is that in gcc non
call sites can throw exceptions! See tree_could_throw_p in tree-eh.c.

I am tempted to believe you (in c++, non call sites can throw exceptions) since I'm not a lot familiar with c++ exceptions, but following Chris notes, you're wrong.

I'm more familiar with Java and c# exceptions, and they definitely need this kind of behavior. This is why the PR is here.

On an other note, if you are right, then that does not matter to me :) I am only changing llvm-gcc so that it compiles and generates code that conforms to the exception model of LLVM. So in that sense, the patch for 7. should be OK.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 16, 2007

I think they have to become terminators: if you allow control flow
to branch in the middle of basic blocks

Err, I do not allow control flow to branch in the middle of basic blocks. I allow (well actually, this design) non-terminator instructions to branch to a basic block.

Actually, still following Chris notes, they should not be terminator inst, since this would not avoid having things like (replace invoke by calls):

invoke bar() to cont unwind %BarThrew

BarThrew:
invoke ~X(&F) to cont2 unwind %unexp
cont2:
invoke ~X(&E) to cont3 unwind %unexp
cont3:
invoke ~X(&D) to cont4 unwind %unexp
cont4:
invoke ~X(&C) to cont5 unwind %unexp
cont5:
invoke ~X(&B) to cont6 unwind %unexp
cont6:
invoke ~X(&A) to cont7 unwind %unexp
cont7:
unwind
unexp:
call std::unexpected()
unwind

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 16, 2007

to handle this in the new setup? Must all potentially throwing
instructions imply compiler barriers?

For that I am tempted to apply the 1. to 5. patches (preparing for changes), email the list for talking about this kind of issue (involving codegen folks) and magically all will work!

Since this will surely not be the case, we need to have a discussion of wether or not we want this feature. I definitely want it, and I definitely want it to be in 2.2. However, if it's not a desirable feature for now (because it involves too many parts of LLVM), we could delay, but I'm afraid it will be harder and harder to switch the exception model in such a case.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 4, 2007

Following discussions with Chris and Duncan, the file exceptionsBlock.patch sets throwing instructions (calls or non calls) as terminator instructions.

The main modifications are:

  1. Removes the TerminatorInst class
  2. Sets the BasicBlock class as a User. A BasicBlock now uses two blocks: and unwind destination and a normal destination (which can be set to null).
  3. Modifies the .ll and .bc parsers and writers (but should be compatible with old .bc and .ll files)
  4. Modifies SelectionDAGISel.cpp to generate exception DAGs for throwing instructions instead of only Invoke instructions.
  5. You can set non call exceptions with the -non-call-exceptions command line (defined in Instruction.cpp)

I tried to provide a compatibility layer with Invoke instructions, but this involves too many workaround for an instruction that will disappear. .ll or .bc files which have invoke instructions will not work with this patch. We need a tool to upgrade these files.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 4, 2007

@nlewycky
Copy link
Contributor

Please don't remove TerminatorInst. We still have non-Invoke terminators and it's useful to distinguish the class of instructions that represent control-flow.

Removing invoke is fine, but you still have to upgrade it, for bitcode file format compatibility. If you encounter an invoke, construct a branch to a new BB with a call and a branch in it.

I'd suggest leaving invoke in for now, and just adding the unwindTo support to basic blocks and getting that reviewed and checked into the tree.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Feb 28, 2008

Please don't remove TerminatorInst. We still have non-Invoke terminators and
it's useful to distinguish the class of instructions that represent
control-flow.

Actually, only the class hierarchy with TerminatorInst is removed. You still get a isTerminator() function which returns true if the instruction is a control flow instruction or a throwing instruction.

Removing invoke is fine, but you still have to upgrade it, for bitcode file
format compatibility. If you encounter an invoke, construct a branch to a new
BB with a call and a branch in it.

Yes, from what I remember that was easy to imagine, but painfull to do ;-)

I'd suggest leaving invoke in for now, and just adding the unwindTo support
to basic blocks and getting that reviewed and checked into the tree.

The patch is rather old, and I don't think it is still valid on TOT. Consider this patch as a starter for someone who desperately wants the feature (like me). But really applying it needs someone with better LLVM insights.

@nlewycky
Copy link
Contributor

nlewycky commented Mar 2, 2008

add unwindDest to BasicBlock, making it a User
This patch adds the unwind block to the BasicBlock structure, turning BasicBlock into a User instead of a Value. The asm parser and writer as well as the bitcode reader and writer are updated to handle this field properly.

Nothing else is changed. Most importantly, no semantics of any instruction are changed (such as the "unwind" instruction actually following the unwindDest on a BB), nor does the unwind-block show up as a successor or predecessor in the graph traversal. I'm saving that for the nest patch.

Please let me know whether I can commit this change.

@lattner
Copy link
Collaborator

lattner commented Mar 2, 2008

Overall, the patch looks great. One minor quibble: instead of making FUNC_CODE_BLOCK_UNWINDDEST be a block in funcinfo, why not make this be a record emitted into the function stream? At the start of each basic block you could emit an "INST_BB_UNWINDDEST <bb#>" if the destination is not null, and the reader could easily pick this up. It seems that it would simplify some of the input/out and be more efficient in terms of file size.

Also, "foo: on unwind %bar" is somewhat awkward, how about "foo: unwind_dest %bar" or "foo: unwind_to %bar" ?

Thanks for working on this!

@nlewycky
Copy link
Contributor

nlewycky commented Mar 2, 2008

update Support/CFG.h to handle unwindTo as predecessor / successor
This updates CFG.h in the obvious way to include the unwind dest in the CFG.

@lattner
Copy link
Collaborator

lattner commented Mar 2, 2008

This patch looks fine, but I don't understand why this can't use dyn_cast:

  • if (isa(*It)) // not dyn_cast due to const-correctness
  •  return cast<TerminatorInst>(*It)->getParent();
    

Also, "atEnd()" should be updated to handle the BB case as well. If the block has a null unwind dest, then idx = # succs is end, otherwise it isn't.

-Chris

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 2, 2008

Thanx for doing this Nick.

What are your plans after handling syntactic changes? Things can get very tricky when "throwing instructions must be terminator", and I haven't looked at what this would involve in Codegen. Are you planning on implementing the new EH model? (that would be great by the way!)

@nlewycky
Copy link
Contributor

nlewycky commented Mar 3, 2008

This patch looks fine, but I don't understand why this can't use dyn_cast:

  • if (isa(*It)) // not dyn_cast due to const-correctness
  •  return cast<TerminatorInst>(*It)->getParent();
    

This code works because cast<> preserves const-ness. If I were to rewrite it as:

if (TerminatorInst *TI = dyn_cast(*It))

then I'd be forcing it to be non-const which is wrong (compile error) in one case, or if I did use "const TerminatorInst", it'd be wrong (compile error) trying to cast a const BasicBlock* to a non-const BB*.

I found the template code hard to follow, but I think this is what's going wrong. It certainly fails experimentally.

Also, "atEnd()" should be updated to handle the BB case as well. If the block
has a null unwind dest, then idx = # succs is end, otherwise it isn't.

I don't follow. Neither PredIterator nor SuccIterator have an "atEnd()" method. succ_iterator has a constructor for the end iterator which has already been updated to add one to idx when the BB has an unwind dest.

The It.atEnd() doesn't need to be modified since it's a simple use iterator. It already includes BBs that use this one, much like it includes references to PHI nodes.

@nlewycky
Copy link
Contributor

nlewycky commented Mar 3, 2008

No problem, Nicolas!

I'd like the new EH model too, but that's a long way off. Right now I'm focussing on removing "invoke" from the front and middle. There's a lot of work just to fix all the optimizations to handle unwind-dest properly anyways.

Allowing arbitrary instructions to trap is a whole other ball of wax that I won't even think about until this part is done.

@lattner
Copy link
Collaborator

lattner commented Mar 6, 2008

Re comment 25, "ok and ok".

@nlewycky
Copy link
Contributor

nlewycky commented Mar 6, 2008

teach prune-eh to prune unwind_to
This is a rather inelegant implementation of the idea that a BB with no calls (or only calls marked nounwind) doesn't need to have an unwind_to on it.

@nlewycky
Copy link
Contributor

nlewycky commented Mar 9, 2008

@nlewycky
Copy link
Contributor

nlewycky commented Mar 9, 2008

update the cloner (fixes bugpoint) and its biggest user, the loop optz'ns
No test cases this time, sorry. They're hard to write for this sort of change.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 10, 2008

Welcome Nick to the "I'd like to address this issue, but it's so messy I had to move to something else" list of people :)

Seriously, do you continue working on this?

Here's an idea I just had for handling llvm optimizations. The general problem is that optimizers assume that all predecessors have been executed when entering a block. However, this is wrong for blocks that belong to a try{ }catch clause. One can not say that an instruction has been executed inside the try {} catch (except for the first instructions that will never throw).

What if we trick the successor of a try{}catch clause and let him think its predecessor is the predecessor of the first block of the try{}catch. For that, we'd need a new type of instruction for the end of the try{} catch clause. For example:

start:
instructions
br %tryCatch

tryCatch: unwind to %catcher, normal = %bb2
instructions
endTryCatch(%bb2)

bb2: ; predecessor: start
instructions
ret

catcher:
...

Is this viable? Apart from teaching that a br can not be optimized if blocks have different unwind destinations, no changes should be needed in the optimizers. I'm not sure about dags, but endTryCatch should be treated like any other terminator instructions.

@nlewycky
Copy link
Contributor

No, I stopped work on this altogether. I am no longer believe that having exception-throwing functions not be Terminators is a good approach.

The approach you described is basically where I started, with modifying the way the dominator tree is constructed, except that I didn't add a new instruction to do it.

@kaomoneus
Copy link
Contributor

Hi Nuno. We considered the idea of ConstantRanges usage. But use cases are differs. And we need implement additional methods for ConstantRanges that will bulky. First of all I mean proper operations with sets like a difference (that may produce two ranges) and union. So currently this idea is casted away.

@kaomoneus
Copy link
Contributor

Oh... sorry. Remove previous comment. It was written for another bug.

@rnk
Copy link
Collaborator

rnk commented Nov 11, 2014

I think we can safely say that we're not going to do this.

@rnk
Copy link
Collaborator

rnk commented Jul 1, 2015

People keep asking for this, so we should probably reopen this and write down some of the better ideas we've had for doing it here.


The original idea that Chris outlined boils down to implicitly allowing every possibly trapping instruction to have an implicit exceptional control flow edge. Correct optimizations would have to remember to check for this edge before hoisting, sinking, or doing any of the things they normally do. The fact that this is all implicit is what seems to bug most folks that I've discussed this with.

I happen to think we could make this work. We could add a method like getUnwindLabel to Instruction that checks if the instruction may trap and pulls the unwind label off the BasicBlock if present. The fact that it's on the BB and not the Instruction is merely a memory usage optimization. Then we'd have to audit all the optimizations, of which there are many. Passes like simplifycfg would be required to avoid joining BBs with different unwind edge labels.


As an alternative to the implicit design, we could try something that makes the edges very explicit.

The first approach is to build a pile of intrinsics for every possibly trapping operation that we care to support catching (load, store, fpmath, div, leave the rest unsupported). We can then invoke these intrinsics and allow the backend to do the usual EH_LABEL lowering around the instruction. Alternatively, we could make first class instructions for them (iload/istore, as Peter proposed). This is just making the duplication between the call and invoke instructions worse, though, as now optimizers have to match more instructions that do the same things.


Another way we could do this is by making the 'invoke' instruction a wrapper that uses or contains an arbitrary other instruction. The invoke wrapper takes any instruction and essentially promotes it to a terminator with a normal edge and an exceptional edge. The in-memory representation would be that the possibly-trapping instruction precedes the invoke instruction in the basic block ilist, and the invoke instruction has a use of the trapping instruction. This would require auditing for passes that would insert code between the two instructions, but we'd handle it just like we did for landingpads: add a getLastInsertionPt() to go along with getFirstInsertionPt(). We could also relax the invariant to allow non-trapping instructions (bitcasts, adds, dbg intrinsics) between the trapping instruction and the invoke.

The assembly syntax for this would be:
invoke
to label %normal_edge unwind label %ehedge

The standard call syntax would be:
invoke call void @​f()
to label %normal_edge unwind label %ehedge
%x = invoke call i32 @​g()
to label %normal_edge unwind label %ehedge

For stores, the other big void type instruction:
invoke store i32 0, i32* null
to label %normal_edge unwind label %ehedge

We can still support the old invoke syntax by checking if the token following 'invoke' names an instruction or a type. The old syntax:
invoke void @​f()
to label %normal_edge unwind label %ehedge
%x = invoke i32 @​g()
to label %normal_edge unwind label %ehedge


I have no intention of immediately working on this stuff, but I figured I should dump some ideas in the bug tracker.

@hfinkel
Copy link
Collaborator

hfinkel commented Jul 1, 2015

People keep asking for this, so we should probably reopen this and write
down some of the better ideas we've had for doing it here.


The original idea that Chris outlined boils down to implicitly allowing
every possibly trapping instruction to have an implicit exceptional control
flow edge. Correct optimizations would have to remember to check for this
edge before hoisting, sinking, or doing any of the things they normally do.
The fact that this is all implicit is what seems to bug most folks that I've
discussed this with.
...

Of these options, I'm in favor only of the last. The fact that we currently have non-single-exit basic blocks (because we have calls that can throw, and yet don't terminate their basic blocks) already makes developing correct IR transformations much more complicated than it should be, and adding more such potential unwinding points would only make this worse.

I'd much rather have basic blocks be true single-entry-single-exit regions, all potential unwinding paths included.

@rnk
Copy link
Collaborator

rnk commented Jul 1, 2015

Of these options, I'm in favor only of the last. The fact that we currently
have non-single-exit basic blocks (because we have calls that can throw, and
yet don't terminate their basic blocks) already makes developing correct IR
transformations much more complicated than it should be, and adding more
such potential unwinding points would only make this worse.

I'm not proposing to change this situation. If there are no EH actions to take after a call, it seems like a win to promote invokes to calls to remove non-existent edges from the CFG. All function calls can setjmp out or exit the current thread without using EH. Any operation that needs to get retired before function exit can't be moved across an opaque call, regardless of how we handle EH.

I'd much rather have basic blocks be true single-entry-single-exit regions,
all potential unwinding paths included.

I think that hits the nail on the head. That's the property that existing analyses and transforms rely on.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Sep 17, 2015

As embedded systems developers I just wanted to chip in a use-case of -fnon-call-exceptions:

A bus error coming from the memory subsystem is generally a sign that your system is hosed pretty badly, hence not much attention is generally given to trying to recover from them.

The situation is very different however for memory-mapped I/O. While many peripheral registers are memory-like (or almost so), in other cases a register access is effectively a method call to that peripheral. Unsurprisingly, bus errors are also generated much more liberally, in some cases for very benign error conditions. Catching these with a SIGBUS handler (or some cpu-specific equivalent in case of baremetal code) and turning them into C++ exceptions is one of the saner ways of dealing with such peripherals.

Note that -fnon-call-exceptions just happens to be the only way to make this work in gcc afaik but I'm not actually a fan of its broad scope. For example I noticed that, as a result, inside the exception handler it actually has to generate extra code to deal with the possibility that reading a field from the exception structure might itself throw another exception! This is silliness imho...

My suggestion would be a cv-qualifier-like attribute that marks such objects, and a compiler switch to have it implicitly applied to all objects marked volatile (since "method-like" reads/writes will be marked as such already anyway).

Of course this only covers a subset of uses of -fnon-call-exceptions, but I imagine it is far less invasive.

@rnk
Copy link
Collaborator

rnk commented Dec 3, 2019

*** Bug llvm/llvm-bugzilla-archive#44174 has been marked as a duplicate of this bug. ***

@rnk
Copy link
Collaborator

rnk commented Dec 24, 2019

*** Bug llvm/llvm-bugzilla-archive#44372 has been marked as a duplicate of this bug. ***

@rnk
Copy link
Collaborator

rnk commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#44174

@rnk
Copy link
Collaborator

rnk commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#44372

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
@Endilll Endilll added enhancement Improving things as opposed to bug fixing, e.g. new or missing feature and removed new-feature labels Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla enhancement Improving things as opposed to bug fixing, e.g. new or missing feature llvm:core
Projects
None yet
Development

No branches or pull requests

8 participants