Bugzilla – Bug 810
Make better use of conditional unreachabl's in LLVM
Last modified: 2009-11-10 11:44:40
You need to log in before you can comment on or make changes to this bug.
Hi, I'd like to open a discussion on adding assert/assume intrinsics to LLVM. Here I will describe the syntax, semantics, and applications of both. Those statements should have the following syntax: assert/assume ( Boolean_variable, String ) "Boolean_variable" represents a condition. The only purpose of "String" is to classify assertions and assumptions into classes, so that, for example, certain class of assertions can be enabled or disabled during the code generation. For maximal generality, I'd suggest leaving "String" unspecified for now. The users can come up with identifiers like "Range_check" or similar. The semantics of statements is: ASSERT - if the condition evaluates to TRUE, the statement is a NOP, otherwise the assertion terminates the program. Code generators should be able to enable/disable assertions, but are also allowed to completely ignore them. ASSUME - is always a NOP, but the code generators might generate code that issues a warning if the condition evaluates to FALSE. Assumptions present the conditions that "should", but are not guaranteed to hold. Applications: ASSERT - debugging, code reliability, can represent both user provided assertions (invariants, pre/post-conditions) and automatically inserted checks, like range checking. ASSUME - primarily as a tool for communication between passes. Some possible applications include: * language independent attribute mechanism [ like assume(restricted(ptr_x), "Restricted Pointer") ] * passing information through different stages of the compilation framework, for example, a loop analysis pass might store an inferred loop invariant as assume(x + y < 200, "Loop Invariant"). * static checking. For details I'd suggest papers from Dijkstra, Greg Nelson, Rustan Leino. It should be noted that ASSUMEs have application specific meaning, and should not be touched by passes that do not use them (something like dbg intrinsics). Looking forward to your comments, Domagoj Babic
Basic question about "ASSERT": why bother? LLVM already has the ability to express "if this condition isn't true, abort", you need no intrinsic for that. Assume is useful as an intrinsic, and could conceptually be useful for conveying information from the front-end to the optimizers. However, using it this way would require a bunch of new predicates (e.g. 'restricted' which aren't defined here), and it would seriously bloat the code. Why do you need assume for loop invariants? Those are trivial to compute. -Chris
The addvantage of having explicit ASSERT statement is that it can be used by automatic code annotation tools can infer assertions. Also, it's handy to have the capability of switching certain classes of assertions on/off during the code generation. You just can't get the same functionality with plain test-exit blocks. Here's an example: let's assume that you're compiling some kernel (linux/or whatever) for debugging. And, you have a tool that inserts lock checks (or access to blocking calls while the kernel lock is locked, or similar). Now, if you insert test-exit blocks, you need to recompile pretty much everything if you want to check something else. If you could disable inserted assertions at code generation time, you would need just to run the back-end again. As about loop invariants, I disagree. Can you suggest an algorithm to find an invariant of the following loop: while (y1 != y2) { while (y1 > y2) { y1=y1-y2; y4=y4+y3; } while (y2 > y1) { y2=y2-y1; y3=y3+y4; } } I can't, and I'll happily pay a beer to anyone who can. Additional predicates can be introduced when and if needed. Currently there's no way to pass certain additional info from the source to the bytecode level. For example, it would be usefull to know that certain pointer is expected to be in user or kernel space. We can start with plain boolean variables. Furthermore, LLVM is lacking a language-independent attribute system. Assumes would solve that problem too. And, I wonder whether that statement about code blowup is true. How many predicates would one need in the worst case? 10-20 for Linux kernel?
I still don't understand your comments about assert. Any transformation tool that can insert an assert attribute can insert code that calls abort. I don't understand your comment about the loop invariant. There are no loop invariants in that loop, at least not in the classical sense. What do you mean? > And, I wonder whether that statement about code blowup is true. How many > predicates would one need in the worst case? 10-20 for Linux kernel? Code blowup isn't a matter of how many unique predicates you need, it's a question of how many assert/assumes you're inserting. In the case of user/kernel, you need at least one intrinsic for *every pointer variable*. > Furthermore, LLVM is lacking a language-independent attribute system. Assumes > would solve that problem too. This is a feature, not a bug. :) -Chris
> I still don't understand your comments about assert. Any transformation tool that can insert an assert > attribute can insert code that calls abort. Yes, and everyone can use GCC as a front-end. :-) The fact that something can be done doesn't mean that it should be done that way. > I don't understand your comment about the loop invariant. There are no loop invariants in that loop, at > least not in the classical sense. What do you mean? There is an invariant, y1=GCD(x1,x2) && y3+y4=LCM(x1,x2). The point is that you can use assumes to pass information from one pass (which might be very non-trivial, like inferring loop invariants) to another through byte-code annotation. > > And, I wonder whether that statement about code blowup is true. How many > > predicates would one need in the worst case? 10-20 for Linux kernel? > > Code blowup isn't a matter of how many unique predicates you need, it's a question of how many > assert/assumes you're inserting. In the case of user/kernel, you need at least one intrinsic for *every > pointer variable*. The code can be annotated with assumes/asserts but doesn't have to be. In many circumstances adding one assert/assume per pointer is negligible cost. There is a good reason why _user_ and _kernel_ attributes are pushed into the kernel source. A good number of pointers is already attributed, and I don't see anyone complaining. > > Furthermore, LLVM is lacking a language-independent attribute system. Assumes > > would solve that problem too. > > This is a feature, not a bug. :) Right, not a bug, but a lacking feature. Domagoj
> Yes, and everyone can use GCC as a front-end. :-) I have no idea what this refers to. > The fact that something can be done doesn't mean that it should be done > that way. Agreed, what's your point? > There is an invariant, y1=GCD(x1,x2) && y3+y4=LCM(x1,x2). The point is > that you can use assumes to pass information from one pass (which might > be very non-trivial, like inferring loop invariants) to another > through byte-code annotation. Please please please start using standard terminology. LLVM already has a mechanism for doing this, the Pass infrastructure. We aready handle alias analysis and other high-level analyses with it, it works well, and it would handle this better than your annotation system. > The code can be annotated with assumes/asserts but doesn't have to be. > In many circumstances adding one assert/assume per pointer is negligible > cost. There is a good reason why _user_ and _kernel_ attributes are > pushed into the kernel source. A good number of pointers is already > attributed, and I don't see anyone complaining. You are conflating the ideas of annotations in the source code (which I don't care about) with the size of the LLVM IR. Adding an intrinsic for *every pointer ssa variable* for the entire linux kernel would add millions of intrinsics. This would dramatically slow down the compiler and have other major effects. Yes, this is possibly tolerable to some people, but this does not feel like the right way to do this. > > This is a feature, not a bug. :) > Right, not a bug, but a lacking feature. No it isn't. Please see the extensive discussions in the llvmdev mailing list on this topic. -Chris
> > The fact that something can be done doesn't mean that it should be done > > that way. > > Agreed, what's your point? The point is that having an ASSERT intrinsic is a better way to handle assertions than inserting test-exit blocks, and ASSERTs can be easily enabled/disabled. > > There is an invariant, y1=GCD(x1,x2) && y3+y4=LCM(x1,x2). The point is > > that you can use assumes to pass information from one pass (which might > > be very non-trivial, like inferring loop invariants) to another > > through byte-code annotation. > Please please please start using standard terminology. LLVM already has a mechanism for doing this, > the Pass infrastructure. We aready handle alias analysis and other high-level analyses with it, it works > well, and it would handle this better than your annotation system. I'm using standard terminology. You're missing the point. Pass infrastructure is powerful, but it's much handier to have assert/assume in the bytecode. This solution transcedents the problems with pass ordering, dependencies between Module and Function passes, etc. > You are conflating the ideas of annotations in the source code (which I don't care about) with the size of > the LLVM IR. Adding an intrinsic for *every pointer ssa variable* for the entire linux kernel would add > millions of intrinsics. This would dramatically slow down the compiler and have other major effects. > Yes, this is possibly tolerable to some people, but this does not feel like the right way to do this. Can you propose an alternative way to get that information into LLVM form somehow? Or you strongly believe that propagating such language-dependencies into LLVM is inappropriate? BTW, you would have at most the number of predicates that is equal to the number of attributes in the code. Mostly function parameters are attributed, so that theory about millions of intrinsics just doesn't hold. > No it isn't. Please see the extensive discussions in the llvmdev mailing list on this topic. Which topic exactly? Domagoj
>The point is that having an ASSERT intrinsic is a better way to handle > assertions than inserting test-exit blocks, and ASSERTs can be easily > enabled/disabled. While I understand that this is your opinion, you haven't said anything that backs that up. Real code can be enabled/disabled, specialized away, etc. > Pass infrastructure is powerful, but it's much handier to have assert/assume > in the bytecode. This solution transcedents the problems with pass > ordering, dependencies between Module and Function passes, etc. Not true. This doesn't have pass ordering problems because it's not powerful enough to express such things. Dependencies between module passes and functions passes are an implementation detail of the pass manager, also something that your representation just isn't powerful enough to run into. Reading/writing pass results into .bc files would have exactly the same expressiveness as bolting on new intrinsics. > Can you propose an alternative way to get that information into LLVM > form somehow? Extend the IR. > Or you strongly believe that propagating such > language-dependencies into LLVM is inappropriate? You're missing that this is inherently a language extension that you're talking about here. Why not admit it and treat it like it is? > BTW, you would have at most the number of predicates that is equal to > the number of attributes in the code. Mostly function parameters are > attributed, so that theory about millions of intrinsics just doesn't > hold. Incorrect. First, programmers annotate non-ssa variables. Once in SSA form, one non-ssa variable can become many SSA variables. Further, many of these annotations can be hidden from the program in type defs and other forms, or derived pointers are inferred from base pointers. At the LLVM level all of this is explicit. Further, to illustrate that this is a language extension, LLVM transformations need to know what these things are and what they mean. A trivial example, in the spirit above, is the SSA promotion pass. It needs to know how to turn intrinsics talking about "non-ssa variables" into N intrinsics talking about ssa-variables. This is an example of how annotations are not a panacea that people think they are. > Which topic exactly? If you look, you'll find it. I would find it no quicker than you could. It has been discussed extensively multiple times in the past. -Chris
> >The point is that having an ASSERT intrinsic is a better way to handle > > assertions than inserting test-exit blocks, and ASSERTs can be easily > > enabled/disabled. > > While I understand that this is your opinion, you haven't said anything that backs that up. Real code > can be enabled/disabled, specialized away, etc. I just don't see how handling an arbitrary amount of code and figuring out what represents an assertion, what not, and what type of assertion is simpler or more convinient than having a single intrinsic/instruction/whatever... > > Pass infrastructure is powerful, but it's much handier to have assert/assume > > in the bytecode. This solution transcedents the problems with pass > > ordering, dependencies between Module and Function passes, etc. > > Not true. This doesn't have pass ordering problems because it's not powerful enough to express such > things. Dependencies between module passes and functions passes are an implementation detail of > the pass manager, also something that your representation just isn't powerful enough to run into. > > Reading/writing pass results into .bc files would have exactly the same expressiveness as bolting on > new intrinsics. Being simple is a feature, not a bug, that's why asserts/assumes are such a handy concept, but certainly not meant for describing dependencies between passes :-D > > Can you propose an alternative way to get that information into LLVM > > form somehow? > > Extend the IR. > You're missing that this is inherently a language extension that you're talking about here. Why not > admit it and treat it like it is? Look, I don't really care that much how assertions are supported. For my purposes I might just as well insert function calls to externally defined functions. However, I think a more general solution would be useful. As about assumes, I see no other way to get the info I'd like to have through the front-end (restrict, _user_, _kernel_ pointers). > > BTW, you would have at most the number of predicates that is equal to > > the number of attributes in the code. Mostly function parameters are > > attributed, so that theory about millions of intrinsics just doesn't > > hold. > > Incorrect. First, programmers annotate non-ssa variables. Once in SSA form, one non-ssa variable can > become many SSA variables. Further, many of these annotations can be hidden from the program in > type defs and other forms, or derived pointers are inferred from base pointers. At the LLVM level all of > this is explicit. All true, but I simply don't see those millions of intrinsics. > > Which topic exactly? > > If you look, you'll find it. I would find it no quicker than you could. It has been discussed extensively > multiple times in the past. Let me rephrase: >> > No it isn't. Please see the extensive discussions in the llvmdev mailing list >> on this topic. What are you talking about? What should I look-up? Discussions on intrinsics/asserts/assumes/IR? What? Domagoj
A simpler way to do this: http://nondot.org/sabre/LLVMNotes/BuiltinUnreachable.txt
(In reply to comment #9) > A simpler way to do this: > http://nondot.org/sabre/LLVMNotes/BuiltinUnreachable.txt > Would it not be more compact to use an undef label? That is: br i1 %condition, label %labelIfTrue, label undef rather than br i1 %condition, label %labelIfTrue, label %labelIfFalse labelIfFalse: unreachable