Bugzilla – Bug 1269
Add support for "non-call" exceptions, eliminate invoke
Last modified: 2008-03-08 23:15:14
You need to log in before you can comment on or make changes to this bug.
I would like to see Chris' notes on Exception Handling implemented (see the link above), preferably for 2.0. My reasons for this are two fold: 1. It fits in the same category of disruptive assembly/bytecode changes as the other major changes in LLVM 2.0; so, including it in 2.0 means less future impact for 2.0 users. 2. As I start to develop the HLVM exception model, I'd prefer to do it on an EH model in LLVM that is simpler for the kinds of languages HLVM targets. Yes, I'm signing up to do this; hopefully, before 2.0 is released.
Very cool.
Some increments to map this work out (feedback would be welcome): 1. Add an "unwindTo" member to BasicBlock. 2. Add "unwind to" keywords to block label in assembly parser 3. Add "unwind to" keywords to block label in assembly writer 4. Add support for "unwind to" to bytecode reader 5. Add support for "unwind to" to bytecode writer. 6. Fix prune-eh to deal with "unwindTo" in BasicBlock. 7. Remove generation of invoke instructions in llvm-gcc 8. Remove invoke instruction from assembly parser 9. Remove invoke instruction from assembly writer 10. Remove invoke instruction from bytecode reader 11. Remove invoke instruction from bytecode writer 12. Replace all uses of invoke instruction in LLVM passes with Call instruction 13. Remove all uses of CallSite class 14. Remove CallSite class. 15. Remove invoke instruction from LLVM IR Anything I forgot, or anything that would lead to an inconsistent state?
Unfortunately, I won't get time to work on this for 2.0
Here are the notes in question: http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt
Btw, this should simplify debug information emission a lot, since DWARF debug information operates on "ranges".
Looks like I'm not going to get to this any time soon.
Created an attachment (id=1157) [details] Patch for steps 1. to 5.
I would like to sign on this. I'm not entirely familiar with everything so currently I believe that 1. to 15. are the right steps to accomplish it. Here's a first patch that addresses 1. to 5. I wait for your comments. If everything is fine, I'll assign myself to the bug (do I have the rights to do that?) Before addressing 6., we must think on how we represent control flow when a basic block unwinds to another basic block. The issue is that a BasicBlock is currently not a User object, so it can not be added to the use objects of another basic block. I see three windows here on how to implement this: 1) Following Chris notes, we could add a bit to each instruction to say if it can raise an exception. In that case, the instructions that raise an exception of a basic block are all "uses" of the basic block to which they may unwind to (but this may change all these instructions to terminators instructions in LLVM design) 2) We add a new field to the BasicBlock class, which is an array of BasicBlocks, listing all predecessors (I think a BasicBlock only needs to know the basic blocks that are predecessors, not the specific instructions that may jump to the basic block. I may be wrong here) 3) Make the BasicBlock class inherits the User class (but does it make sense?) I've been dying on having this feature implemented. I think once we solve this implementation of uses issue, steps 6. to 15. shouldn't be hard to implement. And by the way, I must say, it's a real pleasure to hack LLVM. You've done a really great job for having so, so, so readable code (I'm talking about the code that I changed for 1. to 5. OK, it's just parsing, but well, I've found so unreadable code on other projects....)
Created an attachment (id=1159) [details] Patch for steps 1. to 5., 7. and 8. to 11. This patch corrects the previous patch (a bug in AsmWriter) and implements 8. to 11. It also contains the patch for step 7. I think once we solve the issue on my first comment, we can plan on removing the InvokeInstruction visible to users (since Invoke instructions are spread all over LLVM, it may take a long time before removing them all). Once I get your feedback on the patches, I'll apply steps 1. to 5., and then mail the mailing list, informing of the changes. I hope it does not imply many changes in the codegen. For step 7, since in gcc call sites only can throw exceptions, the changes were not difficult. However, my machine can not compile llvm-gcc, can someone look if my patch compiles fine?
One issue not addressed in steps 1 .. 15 is the problem of instruction re-ordering. For example, consider ... %tmp = some_useful_value invoke some_routine unwind %some_unwind_block ... some_unwind_block: ...use %tmp... It is essential that codegen does not allow the assignment to %tmp to be re-ordered after the invoke. This is carefully taken care of in SelectionDAGISel. The difficulty is that since %tmp is a register, and isn't used as an argument to the invoke call, a priori there's no reason why it shouldn't be re-ordered after the call. This is why special logic is required, which basically says that an invoke implies a compiler barrier. How to handle this in the new setup? Must all potentially throwing instructions imply compiler barriers?
> 1) Following Chris notes, we could add a bit to each instruction to say if > it can raise an exception. In that case, the instructions that raise an > exception of a basic block are all "uses" of the basic block to which > they may unwind to (but this may change all these instructions to > terminators instructions in LLVM design) I think they have to become terminators: if you allow control flow to branch in the middle of basic blocks then you have discarded the whole notion of "basic block" which seems like a very dangerous thing to do.
> For step 7, since in gcc call sites only can throw exceptions, the changes > were not difficult. This is completely wrong: the whole reason for this PR is that in gcc non call sites can throw exceptions! See tree_could_throw_p in tree-eh.c.
(In reply to comment #12) > > For step 7, since in gcc call sites only can throw exceptions, the changes > > were not difficult. > > This is completely wrong: the whole reason for this PR is that in gcc non > call sites can throw exceptions! See tree_could_throw_p in tree-eh.c. > I am tempted to believe you (in c++, non call sites can throw exceptions) since I'm not a lot familiar with c++ exceptions, but following Chris notes, you're wrong. I'm more familiar with Java and c# exceptions, and they definitely need this kind of behavior. This is why the PR is here. On an other note, if you are right, then that does not matter to me :) I am only changing llvm-gcc so that it compiles and generates code that conforms to the exception model of LLVM. So in that sense, the patch for 7. should be OK.
(In reply to comment #11) > I think they have to become terminators: if you allow control flow > to branch in the middle of basic blocks Err, I do not allow control flow to branch in the middle of basic blocks. I allow (well actually, this design) non-terminator instructions to branch to a basic block. Actually, still following Chris notes, they should not be terminator inst, since this would not avoid having things like (replace invoke by calls): invoke bar() to cont unwind %BarThrew BarThrew: invoke ~X(&F) to cont2 unwind %unexp cont2: invoke ~X(&E) to cont3 unwind %unexp cont3: invoke ~X(&D) to cont4 unwind %unexp cont4: invoke ~X(&C) to cont5 unwind %unexp cont5: invoke ~X(&B) to cont6 unwind %unexp cont6: invoke ~X(&A) to cont7 unwind %unexp cont7: unwind unexp: call std::unexpected() unwind
(In reply to comment #10) > to handle this in the new setup? Must all potentially throwing > instructions imply compiler barriers? > For that I am tempted to apply the 1. to 5. patches (preparing for changes), email the list for talking about this kind of issue (involving codegen folks) and magically all will work! Since this will surely not be the case, we need to have a discussion of wether or not we want this feature. I definitely want it, and I definitely want it to be in 2.2. However, if it's not a desirable feature for now (because it involves too many parts of LLVM), we could delay, but I'm afraid it will be harder and harder to switch the exception model in such a case.
Following discussions with Chris and Duncan, the file exceptionsBlock.patch sets throwing instructions (calls or non calls) as terminator instructions. The main modifications are: 1) Removes the TerminatorInst class 2) Sets the BasicBlock class as a User. A BasicBlock now uses two blocks: and unwind destination and a normal destination (which can be set to null). 3) Modifies the .ll and .bc parsers and writers (but should be compatible with old .bc and .ll files) 4) Modifies SelectionDAGISel.cpp to generate exception DAGs for throwing instructions instead of only Invoke instructions. 5) You can set non call exceptions with the -non-call-exceptions command line (defined in Instruction.cpp) I tried to provide a compatibility layer with Invoke instructions, but this involves too many workaround for an instruction that will disappear. .ll or .bc files which have invoke instructions will not work with this patch. We need a tool to upgrade these files.
Created an attachment (id=1278) [details] Throwing instructions as terminators
Please don't remove TerminatorInst. We still have non-Invoke terminators and it's useful to distinguish the class of instructions that represent control-flow. Removing invoke is fine, but you still have to upgrade it, for bitcode file format compatibility. If you encounter an invoke, construct a branch to a new BB with a call and a branch in it. I'd suggest leaving invoke *in* for now, and just adding the unwindTo support to basic blocks and getting that reviewed and checked into the tree.
(In reply to comment #18) > Please don't remove TerminatorInst. We still have non-Invoke terminators and > it's useful to distinguish the class of instructions that represent > control-flow. > Actually, only the class hierarchy with TerminatorInst is removed. You still get a isTerminator() function which returns true if the instruction is a control flow instruction or a throwing instruction. > Removing invoke is fine, but you still have to upgrade it, for bitcode file > format compatibility. If you encounter an invoke, construct a branch to a new > BB with a call and a branch in it. Yes, from what I remember that was easy to imagine, but painfull to do ;-) > > I'd suggest leaving invoke *in* for now, and just adding the unwindTo support > to basic blocks and getting that reviewed and checked into the tree. > The patch is rather old, and I don't think it is still valid on TOT. Consider this patch as a starter for someone who desperately wants the feature (like me). But really applying it needs someone with better LLVM insights.
Created an attachment (id=1482) [details] add unwindDest to BasicBlock, making it a User This patch adds the unwind block to the BasicBlock structure, turning BasicBlock into a User instead of a Value. The asm parser and writer as well as the bitcode reader and writer are updated to handle this field properly. Nothing else is changed. Most importantly, no semantics of any instruction are changed (such as the "unwind" instruction actually following the unwindDest on a BB), nor does the unwind-block show up as a successor or predecessor in the graph traversal. I'm saving that for the nest patch. Please let me know whether I can commit this change.
Overall, the patch looks great. One minor quibble: instead of making FUNC_CODE_BLOCK_UNWINDDEST be a block in funcinfo, why not make this be a record emitted into the function stream? At the start of each basic block you could emit an "INST_BB_UNWINDDEST <bb#>" if the destination is not null, and the reader could easily pick this up. It seems that it would simplify some of the input/out and be more efficient in terms of file size. Also, "foo: on unwind %bar" is somewhat awkward, how about "foo: unwind_dest %bar" or "foo: unwind_to %bar" ? Thanks for working on this!
Created an attachment (id=1483) [details] update Support/CFG.h to handle unwindTo as predecessor / successor This updates CFG.h in the obvious way to include the unwind dest in the CFG.
This patch looks fine, but I don't understand why this can't use dyn_cast: + if (isa<TerminatorInst>(*It)) // not dyn_cast due to const-correctness + return cast<TerminatorInst>(*It)->getParent(); Also, "atEnd()" should be updated to handle the BB case as well. If the block has a null unwind dest, then idx = # succs is end, otherwise it isn't. -Chris
Thanx for doing this Nick. What are your plans after handling syntactic changes? Things can get very tricky when "throwing instructions must be terminator", and I haven't looked at what this would involve in Codegen. Are you planning on implementing the new EH model? (that would be great by the way!)
> This patch looks fine, but I don't understand why this can't use dyn_cast: > > + if (isa<TerminatorInst>(*It)) // not dyn_cast due to const-correctness > + return cast<TerminatorInst>(*It)->getParent(); This code works because cast<> preserves const-ness. If I were to rewrite it as: if (TerminatorInst *TI = dyn_cast<TerminatorInst>(*It)) then I'd be forcing it to be non-const which is wrong (compile error) in one case, or if I did use "const TerminatorInst", it'd be wrong (compile error) trying to cast a const BasicBlock* to a non-const BB*. I found the template code hard to follow, but I think this is what's going wrong. It certainly fails experimentally. > Also, "atEnd()" should be updated to handle the BB case as well. If the block > has a null unwind dest, then idx = # succs is end, otherwise it isn't. I don't follow. Neither PredIterator nor SuccIterator have an "atEnd()" method. succ_iterator has a constructor for the end iterator which has already been updated to add one to idx when the BB has an unwind dest. The It.atEnd() doesn't need to be modified since it's a simple use iterator. It already includes BBs that use this one, much like it includes references to PHI nodes.
No problem, Nicolas! I'd like the new EH model too, but that's a long way off. Right now I'm focussing on removing "invoke" from the front and middle. There's a lot of work just to fix all the optimizations to handle unwind-dest properly anyways. Allowing arbitrary instructions to trap is a whole other ball of wax that I won't even think about until this part is done.
Re comment 25, "ok and ok".
Created an attachment (id=1497) [details] teach prune-eh to prune unwind_to This is a rather inelegant implementation of the idea that a BB with no calls (or only calls marked nounwind) doesn't need to have an unwind_to on it.
Created an attachment (id=1510) [details] make the inliner and simplifycfg respect unwind_to
Created an attachment (id=1511) [details] update the cloner (fixes bugpoint) and its biggest user, the loop optz'ns No test cases this time, sorry. They're hard to write for this sort of change.