First Last Prev Next    No search results available
Details
: Make better use of conditional unreachabl's in LLVM
Bug#: 810
: libraries
: Global Analyses
Status: NEW
Resolution:
: All
: All
: trunk
: P2
: enhancement
: ---

:
: new-feature
:
:
  Show dependency tree - Show dependency graph
People
Reporter: Domagoj Babic <domagoj@engineer.com>
Assigned To: Unassigned LLVM Bugs <unassignedbugs@nondot.org>
:

Attachments


Note

You need to log in before you can comment on or make changes to this bug.

Related actions


Description:   Opened: 2006-06-19 01:14
Hi,

I'd like to open a discussion on adding assert/assume intrinsics to
LLVM. Here I will describe the syntax, semantics, and applications 
of both.

Those statements should have the following syntax:

assert/assume ( Boolean_variable, String )

"Boolean_variable" represents a condition.

The only purpose of "String" is to classify assertions and assumptions
into classes, so that, for example, certain class of assertions can be
enabled or disabled during the code generation. For maximal generality,
I'd suggest leaving "String" unspecified for now. The users can come up
with identifiers like "Range_check" or similar.


The semantics of statements is:

ASSERT - if the condition evaluates to TRUE, the statement is a NOP,
otherwise the assertion terminates the program. Code generators should
be able to enable/disable assertions, but are also allowed to completely
ignore them.

ASSUME - is always a NOP, but the code generators might generate code
that issues a warning if the condition evaluates to FALSE. Assumptions
present the conditions that "should", but are not guaranteed to hold.


Applications:

ASSERT - debugging, code reliability, can represent both user provided
assertions (invariants, pre/post-conditions) and automatically inserted
checks, like range checking.

ASSUME - primarily as a tool for communication between passes. Some
possible applications include: 

* language independent attribute mechanism
[ like assume(restricted(ptr_x), "Restricted Pointer") ] 

* passing information through different stages of the compilation 
framework, for example, a loop analysis pass might store an inferred
loop invariant as assume(x + y < 200, "Loop Invariant").

* static checking. For details I'd suggest papers from Dijkstra, Greg
Nelson, Rustan Leino.


It should be noted that ASSUMEs have application specific meaning, and
should not be touched by passes that do not use them (something like dbg
intrinsics).

Looking forward to your comments,

Domagoj Babic
------- Comment #1 From Chris Lattner 2006-06-19 13:52:52 -------
Basic question about "ASSERT": why bother?  LLVM already has the ability to
express "if this condition isn't 
true, abort", you need no intrinsic for that.

Assume is useful as an intrinsic, and could conceptually be useful for
conveying information from the 
front-end to the optimizers.  However, using it this way would require a bunch
of new predicates (e.g. 
'restricted' which aren't defined here), and it would seriously bloat the code.

Why do you need assume for loop invariants? Those are trivial to compute.

-Chris
------- Comment #2 From Domagoj Babic 2006-06-20 17:00:09 -------
The addvantage of having explicit ASSERT statement is that it can be used by
automatic code annotation tools can infer assertions. Also, it's handy to have
the capability of switching certain classes of assertions on/off during the code
generation. You just can't get the same functionality with plain test-exit blocks.

Here's an example: let's assume that you're compiling some kernel (linux/or
whatever) for debugging. And, you have a tool that inserts lock checks (or
access to blocking calls while the kernel lock is locked, or similar). Now, if
you insert test-exit blocks, you need to recompile pretty much everything if you
want to check something else. If you could disable inserted assertions at code
generation time, you would need just to run the back-end again.


As about loop invariants, I disagree. Can you suggest an algorithm to find an
invariant of the following loop:
while (y1 != y2) {
  while (y1 > y2) { y1=y1-y2; y4=y4+y3; }
  while (y2 > y1) { y2=y2-y1; y3=y3+y4; }
}

I can't, and I'll happily pay a beer to anyone who can.


Additional predicates can be introduced when and if needed. Currently there's
no way to pass certain additional info from the source to the bytecode level.
For example, it would be usefull to know that certain pointer is expected to be
in user or kernel space. We can start with plain boolean variables. 

Furthermore, LLVM is lacking a language-independent attribute system. Assumes
would solve that problem too.

And, I wonder whether that statement about code blowup is true. How many
predicates would one need in the worst case? 10-20 for Linux kernel?
------- Comment #3 From Chris Lattner 2006-06-20 21:37:15 -------
I still don't understand your comments about assert.  Any transformation tool
that can insert an assert 
attribute can insert code that calls abort.

I don't understand your comment about the loop invariant.  There are no loop
invariants in that loop, at 
least not in the classical sense.  What do you mean?

> And, I wonder whether that statement about code blowup is true. How many
> predicates would one need in the worst case? 10-20 for Linux kernel?

Code blowup isn't a matter of how many unique predicates you need, it's a
question of how many 
assert/assumes you're inserting.  In the case of user/kernel, you need at least
one intrinsic for *every 
pointer variable*.

> Furthermore, LLVM is lacking a language-independent attribute system. Assumes
> would solve that problem too.

This is a feature, not a bug. :)

-Chris
------- Comment #4 From Domagoj Babic 2006-06-20 22:00:51 -------
> I still don't understand your comments about assert.  Any transformation tool
that can insert an assert 
> attribute can insert code that calls abort.

Yes, and everyone can use GCC as a front-end. :-)

The fact that something can be done doesn't mean that it should be done
that way.

> I don't understand your comment about the loop invariant.  There are no loop
invariants in that loop, at 
> least not in the classical sense.  What do you mean?

There is an invariant, y1=GCD(x1,x2) && y3+y4=LCM(x1,x2). The point is
that you can use assumes to pass information from one pass (which might
be very non-trivial, like inferring loop invariants) to another
through byte-code annotation.

> > And, I wonder whether that statement about code blowup is true. How many
> > predicates would one need in the worst case? 10-20 for Linux kernel?
> 
> Code blowup isn't a matter of how many unique predicates you need, it's a
question of how many 
> assert/assumes you're inserting.  In the case of user/kernel, you need at
least one intrinsic for *every 
> pointer variable*.

The code can be annotated with assumes/asserts but doesn't have to be.
In many circumstances adding one assert/assume per pointer is negligible
cost. There is a good reason why _user_ and _kernel_ attributes are
pushed into the kernel source. A good number of pointers is already
attributed, and I don't see anyone complaining. 

> > Furthermore, LLVM is lacking a language-independent attribute system. Assumes
> > would solve that problem too.
>
> This is a feature, not a bug. :)

Right, not a bug, but a lacking feature. 

Domagoj

------- Comment #5 From Chris Lattner 2006-06-20 22:05:23 -------
> Yes, and everyone can use GCC as a front-end. :-)

I have no idea what this refers to.

> The fact that something can be done doesn't mean that it should be done
> that way.

Agreed, what's your point?

> There is an invariant, y1=GCD(x1,x2) && y3+y4=LCM(x1,x2). The point is
> that you can use assumes to pass information from one pass (which might
> be very non-trivial, like inferring loop invariants) to another
> through byte-code annotation.

Please please please start using standard terminology.  LLVM already has a mechanism for doing this, 
the Pass infrastructure.  We aready handle alias analysis and other high-level analyses with it, it works 
well, and it would handle this better than your annotation system.

> The code can be annotated with assumes/asserts but doesn't have to be.
> In many circumstances adding one assert/assume per pointer is negligible
> cost. There is a good reason why _user_ and _kernel_ attributes are
> pushed into the kernel source. A good number of pointers is already
> attributed, and I don't see anyone complaining. 

You are conflating the ideas of annotations in the source code (which I don't care about) with the size of 
the LLVM IR.  Adding an intrinsic for *every pointer ssa variable* for the entire linux kernel would add 
millions of intrinsics.  This would dramatically slow down the compiler and have other major effects.  
Yes, this is possibly tolerable to some people, but this does not feel like the right way to do this.

> > This is a feature, not a bug. :)
> Right, not a bug, but a lacking feature. 

No it isn't.  Please see the extensive discussions in the llvmdev mailing list on this topic.

-Chris
------- Comment #6 From Domagoj Babic 2006-06-20 22:49:17 -------
> > The fact that something can be done doesn't mean that it should be done
> > that way.
>
> Agreed, what's your point?

The point is that having an ASSERT intrinsic is a better way to handle
assertions than inserting test-exit blocks, and ASSERTs can be easily
enabled/disabled.

> > There is an invariant, y1=GCD(x1,x2) && y3+y4=LCM(x1,x2). The point is
> > that you can use assumes to pass information from one pass (which might
> > be very non-trivial, like inferring loop invariants) to another
> > through byte-code annotation.
> Please please please start using standard terminology.  LLVM already has a
mechanism for doing this, 
> the Pass infrastructure.  We aready handle alias analysis and other high-level
analyses with it, it works 
> well, and it would handle this better than your annotation system.

I'm using standard terminology. You're missing the point. Pass
infrastructure is powerful, but it's much handier to have assert/assume
in the bytecode. This solution transcedents the problems with pass
ordering, dependencies between Module and Function passes, etc.

> You are conflating the ideas of annotations in the source code (which I don't
care about) with the size of 
> the LLVM IR.  Adding an intrinsic for *every pointer ssa variable* for the
entire linux kernel would add 
> millions of intrinsics.  This would dramatically slow down the compiler and
have other major effects.  
> Yes, this is possibly tolerable to some people, but this does not feel like
the right way to do this.

Can you propose an alternative way to get that information into LLVM
form somehow? Or you strongly believe that propagating such
language-dependencies into LLVM is inappropriate?

BTW, you would have at most the number of predicates that is equal to
the number of attributes in the code. Mostly function parameters are
attributed, so that theory about millions of intrinsics just doesn't
hold.

> No it isn't.  Please see the extensive discussions in the llvmdev mailing list
on this topic.

Which topic exactly?

Domagoj

------- Comment #7 From Chris Lattner 2006-06-20 22:58:09 -------
>The point is that having an ASSERT intrinsic is a better way to handle
> assertions than inserting test-exit blocks, and ASSERTs can be easily
> enabled/disabled.

While I understand that this is your opinion, you haven't said anything that backs that up.  Real code 
can be enabled/disabled, specialized away, etc.

> Pass infrastructure is powerful, but it's much handier to have assert/assume
> in the bytecode. This solution transcedents the problems with pass
> ordering, dependencies between Module and Function passes, etc.

Not true.  This doesn't have pass ordering problems because it's not powerful enough to express such 
things.  Dependencies between module passes and functions passes are an implementation detail of 
the pass manager, also something that your representation just isn't powerful enough to run into.

Reading/writing pass results into .bc files would have exactly the same expressiveness as bolting on 
new intrinsics.

> Can you propose an alternative way to get that information into LLVM
> form somehow?

Extend the IR.

> Or you strongly believe that propagating such
> language-dependencies into LLVM is inappropriate?

You're missing that this is inherently a language extension that you're talking about here.  Why not 
admit it and treat it like it is?

> BTW, you would have at most the number of predicates that is equal to
> the number of attributes in the code. Mostly function parameters are
> attributed, so that theory about millions of intrinsics just doesn't
> hold.

Incorrect.  First, programmers annotate non-ssa variables.  Once in SSA form, one non-ssa variable can 
become many SSA variables.  Further, many of these annotations can be hidden from the program in 
type defs and other forms, or derived pointers are inferred from base pointers.  At the LLVM level all of 
this is explicit.

Further, to illustrate that this is a language extension, LLVM transformations need to know what these 
things are and what they mean.  A trivial example, in the spirit above, is the SSA promotion pass.  It 
needs to know how to turn intrinsics talking about "non-ssa variables" into N intrinsics talking about 
ssa-variables.  This is an example of how annotations are not a panacea that people think they are.

> Which topic exactly?

If you look, you'll find it.  I would find it no quicker than you could. It has been discussed extensively 
multiple times in the past.

-Chris
------- Comment #8 From Domagoj Babic 2006-06-20 23:18:56 -------
> >The point is that having an ASSERT intrinsic is a better way to handle
> > assertions than inserting test-exit blocks, and ASSERTs can be easily
> > enabled/disabled.
> 
> While I understand that this is your opinion, you haven't said anything that
backs that up.  Real code 
> can be enabled/disabled, specialized away, etc.

I just don't see how handling an arbitrary amount of code and figuring
out what represents an assertion, what not, and what type of assertion
is simpler or more convinient than having a single
intrinsic/instruction/whatever...

> > Pass infrastructure is powerful, but it's much handier to have assert/assume
> > in the bytecode. This solution transcedents the problems with pass
> > ordering, dependencies between Module and Function passes, etc.
> 
> Not true.  This doesn't have pass ordering problems because it's not powerful
enough to express such 
> things.  Dependencies between module passes and functions passes are an
implementation detail of 
> the pass manager, also something that your representation just isn't powerful
enough to run into.
> 
> Reading/writing pass results into .bc files would have exactly the same
expressiveness as bolting on 
> new intrinsics.

Being simple is a feature, not a bug, that's why asserts/assumes are
such a handy concept, but certainly not meant for describing
dependencies between passes :-D 

> > Can you propose an alternative way to get that information into LLVM
> > form somehow?
> 
> Extend the IR.
> You're missing that this is inherently a language extension that you're
talking about here.  Why not 
> admit it and treat it like it is?

Look, I don't really care that much how assertions are supported.
For my purposes I might just as well insert function calls to 
externally defined functions. However, I think a more general 
solution would be useful. As about assumes, I see no other way to get
the info I'd like to have through the front-end (restrict, _user_,
_kernel_ pointers).

> > BTW, you would have at most the number of predicates that is equal to
> > the number of attributes in the code. Mostly function parameters are
> > attributed, so that theory about millions of intrinsics just doesn't
> > hold.
> 
> Incorrect.  First, programmers annotate non-ssa variables.  Once in SSA form,
one non-ssa variable can 
> become many SSA variables.  Further, many of these annotations can be hidden
from the program in 
> type defs and other forms, or derived pointers are inferred from base
pointers.  At the LLVM level all of 
> this is explicit.

All true, but I simply don't see those millions of intrinsics. 

> > Which topic exactly?
> 
> If you look, you'll find it.  I would find it no quicker than you could. It
has been discussed extensively 
> multiple times in the past.

Let me rephrase:

>> > No it isn't.  Please see the extensive discussions in the llvmdev mailing list
>> on this topic.

What are you talking about? What should I look-up? Discussions on
intrinsics/asserts/assumes/IR? What?

Domagoj

------- Comment #9 From Chris Lattner 2009-06-16 01:30:10 -------
A simpler way to do this:
http://nondot.org/sabre/LLVMNotes/BuiltinUnreachable.txt
------- Comment #10 From Alastair Lynn 2009-11-10 11:44:40 -------
(In reply to comment #9)
> A simpler way to do this:
> http://nondot.org/sabre/LLVMNotes/BuiltinUnreachable.txt
> 

Would it not be more compact to use an undef label? That is:

  br i1 %condition, label %labelIfTrue, label undef

rather than

  br i1 %condition, label %labelIfTrue, label %labelIfFalse
labelIfFalse:
  unreachable

First Last Prev Next    No search results available