807 – perform substitution when equality is known

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 807 - perform substitution when equality is known

Summary: perform substitution when equality is known

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Nick Lewycky

URL:
Keywords:	code-quality, new-feature

Depends on:
Blocks:

Reported:	2006-06-10 14:38 PDT by Nick Lewycky
Modified:	2010-02-22 12:42 PST (History)
CC List:	2 users (show)

See Also:
Fixed By Commit(s):

Attachments
Adds new Propagate Equality pass (6.10 KB, patch) 2006-06-11 16:43 PDT, Nick Lewycky	Details
New propagate equality pass. (6.89 KB, patch) 2006-06-11 18:18 PDT, Nick Lewycky	Details
New new propagate equality pass. (7.20 KB, patch) 2006-06-12 01:06 PDT, Nick Lewycky	Details
Now passes llvm-test! (6.80 KB, patch) 2006-06-12 23:07 PDT, Nick Lewycky	Details
First attempt at record-based PropagateEquality pass. (7.49 KB, text/x-c++src) 2006-06-24 00:38 PDT, Nick Lewycky	Details
first functional record-based propequal pass (12.82 KB, text/plain) 2006-06-24 16:38 PDT, Nick Lewycky	Details
*second attempt; includes load ptr, ptr != null.** (14.79 KB, text/plain) 2006-06-25 23:21 PDT, Nick Lewycky	Details
collapse equivalent variables; use std::map for fast lookup (31.87 KB, patch) 2006-08-07 18:26 PDT, Nick Lewycky	Details
fix KPcopy usage (31.88 KB, patch) 2006-08-07 19:28 PDT, Nick Lewycky	Details
Show Obsolete (8) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nick Lewycky 2006-06-10 14:38:13 PDT

The following code doesn't optimize down:

// Source: pag.csail.mit.edu/paste2005/slides/yang-paste2005-slides.ppt

extern void foo();

void process(int *p, int *q)
{
  if (p != q)
    return;

  if (*p != *q)
    foo(); // unreachable
}

In a branch whose condition depends on a EQ or NE, in the side where the
variables are equal (and if the types are equal (?)), replaceAllUsesOfWith(p,
q). This allows the compiler to optimize away the following test and should kill
the liveness of one of the variables. I suppose you could also do the same
optimization with a switch instruction.

Where would such an optimization belong? The instruction combiner?

Comment 1 Nick Lewycky 2006-06-10 22:30:17 PDT

I've implemented my suggested change and it doesn't work. My change updates the
BB for the "then" block after setne to seqeq transformation but that doesn't get
through to the unified return block.

Modifying the source to read:
  if (p != q)
    return
  else 
    q = p;

causes the example to optimize away as it should. Since that's correct in
general (subject to type of p and q and isVolatile) I'd like to do the
equivalent transformation on the SSA form. I'm not sure exactly how.

Comment 2 Chris Lattner 2006-06-11 01:16:40 PDT

It isn't safe to P->replaceAllUsesWith(Q) or vice versa because replaceAllUsesWith will replace ALL uses 
with no notion of region.  This will turn the initial condition into Q != Q, for example.

Implementing this requires a value range propagation pass.  LLVM has two things related to this:

lib/Transforms/Scalar/CondPropagate.cpp, is a very simple conditional value propagor, and is stable.
Scalar/CorrelatedExprs.cpp is a more aggressive propagator but has SSA update bugs and generally 
isn't stable.  It should be replaced.

Unfortunately there isn't currently an easy place to add this to LLVM.

-Chris

Comment 3 Nick Lewycky 2006-06-11 16:43:26 PDT

Created attachment 344 [details]
Adds new Propagate Equality pass

Ok, if it doesn't fit anywhere, I'll create a new place for it.

This finds all branch instructions (only, no switch yet) and finds the set of
basic blocks affected by:
 * using a depth-first iterator to find the children from the true and false
sides
 * then set-subtract the false from the true.

This is written to handle multiple false roots for when switch support is
added. Unfortunately, this is probably horribly slow. Comments appreciated.

Anyways, it handles the test in the bug report correctly, and still passes the
regression tests, so I'd call it a win. This patch needs some minor cleanup
work (include the regtest, fix the name of the opt in the various places it
appears), but I'll do that after I get a review. I'll probably need to change
more than the cleanup I know about.

Comment 4 Nick Lewycky 2006-06-11 17:17:40 PDT

Duh, that algorithm was incorrect and could replace too many uses, in an ugly
case involving a branch inside of a loop. Please wait for an update.

Comment 5 Nick Lewycky 2006-06-11 18:18:34 PDT

Created attachment 345 [details]
New propagate equality pass.

There here does what we wanted before. As far as I know, there are no
outstanding issues. Comes complete with testcase.

Comment 6 Chris Lattner 2006-06-11 18:26:50 PDT

Nifty.  Have you evaluated how many times this triggers on various programs?

I'd suggest setting up a test to see how often it applies, details here:
http://llvm.org/docs/TestingGuide.html#customtest

As a side note, this loop:
for (Instruction::op_iterator j = i->op_begin(); j != i->op_end(); ++j)
  ...
can be replaced with:

i->replaceUsesOfWith(Old, New);

Also, please s/LLVM Research Group/you/ in the file header.  :)

-Chris

Comment 7 Nick Lewycky 2006-06-12 01:06:00 PDT

Created attachment 346 [details]
New new propagate equality pass.

This patch has a different implementation; instead of checking each BB for
users of the value, check the value's use list for uses within the affected
area.

Stats:
2 propequal - Number of equal variables substituted
 in MultiSource/Benchmarks/Olden/bh
7 propequal - Number of equal variables substituted
 in MultiSource/Benchmarks/Prolangs-C/bison

Comment 8 Chris Lattner 2006-06-12 22:29:50 PDT

I realize that this probably isn't ready for review (recent bugs etc), but here is some feedback in the 
meantime.  Overall, the pass looks good and it does catch a useful property.  However, I don't think we 
can justify an entire pass over the IL for this simple optimization, at least not turned on by default.

The pass as you have it now can be checked into CVS whenever you're happy with it and it is relatively 
bug free.  Through inspection, I believe you have some issues updating PHI nodes, but I'll let you play 
with the pass to decide if so and if there is anything else you'd like to change before the first 
incarnation goes into CVS.

Before this gets turned on by default, I'd like to see the pass get a little bit more general.  In particular, 
this is a simple dominator-based optimization.  I believe it would be more general and more efficient to 
formulate this as a table-based dominator tree walk.  Basically the pass would like like this:

1. Keep a table of known (in)equalities, e.g. A == B, A != 0, C < D.
2. Walk the code in dominator tree order.  When you see a conditional branch where a successor is 
dominated by the branch block, see if there is a condition that can be inferred in that successor due to 
the branch condition.  If so, enter it in the table before recursing into the successor part of the 
domtree.  When unrecursing back to the branch block, pop the inequality off of the table.
3. When walking instructions, keep track of equalities that can be inferred, and use the table to simplify 
instructions.  Some examples are below.

Things that this approach can simplify:

   X = load P    <- from this point on, we know P is non-null, enter P != 0 into the table.
   if (P == 0) goto ...   <- obviously false


  if (A != 4) {  <- record A != 5 in the table
     B = A+4;  <- record B != 9 in the table
     if (B == 9) {  <- false

float A, B;
  if (A > B)          <- neither are nan's
    if (isnan(A))    <- false

etc.  There are a *large* number of simple transformations like this that we can build up with this 
capability (look at the ill-fated cee pass for examples).  Starting the framework and slowly expanding it 
is probably the way to go, and the best way to do this is probably in CVS.  In addition to new "tricks", 
you can also make the table more powerful, e.g. to keep track of value ranges in addition to inequalities 
(e.g. that A is [4 .. 9]), etc.

Are you interested in working on this?  If so, the best approach is for you to get the initial version set 
up with the rough framework, then iterate on it in CVS (you'll get write access).  Seem reasonable?

-Chris

Comment 9 Chris Lattner 2006-06-12 22:36:03 PDT

Other more specific comments about your current pass:

1. The LHS/RHS of a setcc instruction are guaranteed to be the same type.
2. Your code should work with calls and casts, why do you special case and ignore them?
3. Your code currently thinks about instructions and constants.  I think it would be better to think about 
instructions and "everything else".  In particular, arguments should be handled like constants.
4. There is value in changing one argument to another:

int foo(int A, int B) {
  if (A == B)
    return A-B;     // turn into A-A or B-B, which instcombine will then turn into 0.
return 42;
}

Comment 10 Nick Lewycky 2006-06-12 23:07:32 PDT

Created attachment 347 [details]
Now passes llvm-test!

I should've warned you not to review, as my local copy diverged pretty far from
the original. Casts and calls aren't treated specially. I took your tip that
the two sides of a setcc are the same type.

There's not that much magical about handling constants. All it does it prevent
the pass from ripping out "uint 0" and replacing it with "tmp.22" because it
knows they're equal. Instead, we should replace an instruction with a constant.
Removing that will also cause it to break; it will rip the "uint 0"'s right out
of your getelementptr instructions.

This also means that it works fine on arguments. In fact, the included testcase
just uses function arguments.

Unfortunately, it doesn't seem to be much of a win. Zero replacements performed
across all of llvm-test.

Comment 11 Chris Lattner 2006-06-12 23:15:41 PDT

Looks reasonable, some more comments then :)

First, you *do* want to handle arguments somewhat like constants.  For example, if you have Arg == 1, 
you obviously want to replace Arg with 1.  However, if you have inst == arg, you want to make sure you 
replace the inst with the arg, not the arg with the inst.

This check:
  if (!Forest->properlyDominates(BranchBB, BB)) continue;

can be done once for the entire expression.  In particular, in conjunction with your other check, you're 
really checking to see if the EqualDest is dominated by the branch block.  Please change the code to 
test to see if the EqualDest is *properly* dominated by the branch block up outside of the use list.  If 
not, don't process any uses.

Next, you should check phi nodes more aggressively.  Consider this:

if (A == 7) {
  ...
} else {
  ...
}
C = phi(A, x)

here you can change the A in the PHI node with 7, because the use occurs in a dominating block, not 
actually in the merge block.  This is actually required for correctness in obscure cases too.

Finally, you should check to make sure you're not replacing FP values incorrectly.  Specifically, if you 
know that (A == B), then you can replace A with B in the true block.  If you know "A != B" (FP-only) then 
A is not neccesarily B in the false block (due to nans).

-Chris

Comment 12 Nick Lewycky 2006-06-12 23:32:51 PDT

As for your more general system idea -- I actually came across this testcase
when looking for papers on how to write such a path-sensitive optimizer. Thus
far, I haven't found one I'm really happy with.

I'm very interested in working on such a pass, though I doubt I could write it
single-handedly. Getting the set of properties right (non-overlapping) is
important, or at least we'll have to have some sort of canonicalization step (a
== 4 implies a != 5). It'd probably work best if we fill entries in the record
lazily; so when the optimizer comes across a statement that could be optimized
if only property X, it works backwards over the program flow and fills in the
records to see if it holds.

I don't really want to go the way of value ranges. That's not expressive enough.
You can't express that a value is even, for example. Actually, I think that a
small expressive language would be the current LLVM instruction set.

And this is there point where I admit that I'm just not entirely sure how that
would work. It seems that I might want to canonicalize thigs down even farther.
Consider that there are (at least) three ways to make a value even: multiply by
2 (or a multiple), shift left by 1 (or more), and "and" away the last bit. All
three could be expressed with and/or operations and appropriate control flow,
then analyzed by a path-sensitive system with knowledge only of a bit at a time.
"mod" would also have to be rewritten this way.

Perhaps I should just start with something simple like EQ and NE, and save the
inventing for another time and place? I would happily implement the idea you
described, the way you described it.

Comment 13 Chris Lattner 2006-06-13 00:31:35 PDT

> I'm very interested in working on such a pass, though I doubt I could write it
> single-handedly.

I'd be happy to help and to guide.

> I don't really want to go the way of value ranges. That's not expressive enough.
> You can't express that a value is even, for example.

Optimization really isn't about catching every possibility for canonicalization and simplification, it's 
about catching a lot in a reasonable amount of compile time.  Having a single forward pass over the 
dominator tree with a table represents a reasonable use of compile time: using a full theorum prover 
(f.e.) probably isn't.  :)  In any case, I think that starting simple, with simple inequalities, is the way to 
go.  Once the table and basic structure is in place, it can be extended in many possible ways. :)

> Perhaps I should just start with something simple like EQ and NE, and save the
> inventing for another time and place? I would happily implement the idea you
> described, the way you described it.

Sounds great,

-Chris

Comment 14 Chris Lattner 2006-06-13 00:33:41 PDT

Oh yeah, btw, instcombine knows that things like X*2&1 are always 0, it can track individual bits, inferring 
when they are known zero or one.

-Chris

Comment 15 Nick Lewycky 2006-06-24 00:38:22 PDT

Created attachment 349 [details]
First attempt at record-based PropagateEquality pass.

This is my first pass at trying to implement the system described by Chris. I'm
not too pleased with it, I don't see how it can handle things like "x == a; x
== b; a == b?", or even just "x == 0; x != 1?" without a design change of some
sort. But then, that's what review is for!

So far, it can handle this example:

  extern void f();
  extern void g();
  void test1(int x, int y) {
    if (x == 4) {
      f();
      if (y)
	if (x == 4) { g(); }
    }
  }

and not much else.

Comment 16 Nick Lewycky 2006-06-24 16:38:50 PDT

Created attachment 350 [details]
first functional record-based propequal pass

This pass manages 1366 variable replacements on llvm-test.

Comment 17 Nick Lewycky 2006-06-24 16:50:42 PDT

I've encountered some problems that warrant design changes.

Firstly, it needs to become a two-pass system. The first pass tags the BBs, the
second pass performs the replacements. At the moment, we have to iterate through
every instruction in the program, not just the use lists of setcc instructions.

The reason is that at the time we see the setcc instruction, the properties
haven't yet been assigned to the BBs. That happens with the branch instruction.

Secondly, the EQProperties and NEProperties aren't canonicalized. Suppose you
create a property for "x == y", and a property for "x != z". Then, you have a
statement "if (x == z)": the first thing that happens is that x -> y, then we
check for "y != z?" which doesn't exist. (The opposite problem exists even if
you do variable substitution after.)

The EQProperty, needs to store the whole equivalence list, not just pairs.

Otherwise, it works rather well. It optimizes the original case that started
this bug.

Comment 18 Chris Lattner 2006-06-24 18:39:45 PDT

I haven't looked in depth at your code (about to do that now) but I don't think that a 2-pass system is 
needed.

> At the moment, we have to iterate through
> every instruction in the program, not just the use lists of setcc instructions.

Right.  This is actually a good thing, because it will be required for some future extensions.  For 
example, if you have this:

X = load P
C = seteq P, null

We can know that C == false.  This case really does occur, for example when compiling the llvm 
dyn_cast template.  This has nothing to do with basic blocks, we we should keep iterating over 
instructions.

-Chris

Comment 19 Chris Lattner 2006-06-24 19:03:22 PDT

Some feedback on the "first functional record-based propequal pass" pass:

You might want to name the pass "CorrelationPropagation.cpp" or something, as eventually we'll be 
propagating more than just equality (e.g. x < 5 implies x < 7, and we could catch that in the future).

Please add a comment block at the top of the file explaining what the pass does, for example you can 
include the example that started this PR.

Please remove any #includes you can.  You don't need Instruction.h (implied by Instructions.h), don't 
need Casting.h, Interval.h, Use.h, and perhaps others.

Instead of making Property an inheritance-based class, you could make property be a simple tuple (V1/
V2/property where property is an enum).  Making it a simple value class would allow you to store it by-
value in the maps, allowing you to drop the new's.

Please don't make the multimaps be global.  They should be instance vars in the class.

Why are you hashing the properties?  With the above changes, PropertiesByHash could be implemented 
as:
   std::set<Property>KnownProperties;
Just implement operator< for properties to sort properties (in an arbitrary order) and the set will unique 
them for you.

Right now your code appears to depend on the order of basic blocks in a function (you use visit
(Function), and attempts to keep track of properties per-basic-block.  I'd suggest the code to be 
modified to work like this pseudo-code:

runOnFunction() {
  std::set KnownProperties;
  return WalkDominatorTree(DT.getRoot(), KnownProperties);
}

bool WalkDominatorTree(DTNode *N, InputKnownProperties) {
  // Copy the properties known so recorded properties only are inherited by successors in the domtree.
  std::set KnownProperties = InputKnownProperties;

  foreach (instruction i in N->getBasicBlock())
    switch (i->getOpcode()) {
      case XYZ:
         visitXYZ(KnownProperties);
    }

  if (BBTerminator is branchinst &&
      condition is X == Y &&
      dominates successors) {
      std::set TrueKnownProperties = KnownProperties;
     TrueKnownProperties.insert(Property(X, Y, ==));
      WalkDominatorTree(DT[TrueSucc], TrueKnownProperties);

      std::set FalseKnownProperties = KnownProperties;
      FalseKnownProperties.insert(Property(X, Y, !=));
      WalkDominatorTree(DT[FalseSucc], FalseKnownProperties);
  } else {
    foreach (dtsucc in N)
      WalkDominatorTree(DT[FalseSucc], KnownProperties);
  }
}

This way, the recursion on the stack naturally handles "pushing and popping" new properties, the visit 
methods can add properties to KnownProperties if desired (i.e. visitLoad can put (P, null, !=) onto the 
stack, and the visit methods can simplify conditions if desired (e.g. if you run across a setcc X != Y, and 
you know about X,Y, you can turn it into true/false.

-Chris

Comment 20 Chris Lattner 2006-06-24 19:09:11 PDT

Oh yeah, PropagateEquality::newConstant is also already available in LLVM.  You want:
  ConstantExpr::get(Opcode, LHS, RHS);

Which does constant folding for you.

-Chris

Comment 21 Nick Lewycky 2006-06-25 18:03:45 PDT

What if there is a BB with two predecessors, neither of whom dominate it, but
both happen to carry the same property?

It would mean doing a breadth-first traversal of the dominator tree, and each BB
takes the union of the properties of its predecessors. Is this too complicated?

Comment 22 Chris Lattner 2006-06-25 18:28:55 PDT

That boils down to being a question of compile time vs code quality trade off.  In practice, I think these 
things are unlikely to be significant, but if needed a full dataflow analysis *could* be done.  I think sticking 
to a dom-tree walk is likely to provide the biggest bang for the cycle though.

-Chris

Comment 23 Nick Lewycky 2006-06-25 23:21:33 PDT

Created attachment 351 [details]
second attempt; includes load* ptr, ptr != null.

Thanks for the review. I was hoping to post a full patch with the new name, but
that will probably have to wait until next weekend. Otherwise, I've
incorporated almost all of your suggestions.

walkDominatorTree got refactored into oblivion. You'll find parts of it named
visitBasicBlock, proceedToSuccessor and the 5 visit functions.

There's quite a bit of redundancy between the "addProperty" and "resolve"
mechanisms. What "addProperty" does is given:

  %A = seteq %x, 4
  %B = setne %y, 0
  %C = and bool %A, %B
  br bool %C label %here, label %there

When the visitBranchInst calls addProperty, gives it Property(%C, true, ==).
addProperty then look inside of %C and see that for an 'and' instruction, that
Property(%A, true, ==) and Property(%B, true, ==). Then, those are lowered to
Property(%x, 4, ==) and Property(%y, 0, !=).

However, there are some cases this misses. "resolve" runs over every operand of
every instruction in every basic block, making it possibly slow, but can catch
cases like:

  %A = seteq %x, 4
  %B = setlt %x, 4
  br bool %A label %here1, label %there
here1:
  br bool %B label %here2, label %there

Because %B is defined outside of the block with the property, it can't be
simplified at definition time (through the visit* functions), nor is it
detected by addProperty above. This accounts for about 5% of all of the
simplifications done.

As for performance, my tests run it with -mem2reg -instcombine -propequal, and
I see 2121 total substitutions and 26 switch cases removed across llvm-test (in
llvm-gcc3, so remove tests using inline asm). Note that it double-counts
variable substitutions; %x -> %y -> %z counts as two. It runs as low as 1% and
as much as 20% of the optimization time, out of just those three passes.

Comment 24 Nick Lewycky 2006-07-09 19:04:08 PDT

I'm considering a different design. I realized that the == property is different
from other properties in that it's used for the variable substitution. I'm
trying a new design which stores sets of synonymous variables together, and
properties are relationships are between two synonym-sets instead of two Values.
This means that if a != b and a == x, then b != x without needing to store a
second property.

Further, because checking for substitutes for variables is so common (once per
instruction's operand), I'll make an std::map to store the mapping from Value to
synonym-set's "most canonicalized form" of said Value.

As I'm still working on the implementation, it remains to be seen whether this
generates enough benefit on the CPU cycle.

Comment 25 Nick Lewycky 2006-08-07 18:26:38 PDT

Created attachment 377 [details]
collapse equivalent variables; use std::map for fast lookup

As is, this patch triggers an assertion, bug 871, for five of the tests in the
llvm-test suite.

The included feature tests aren't quite exhaustive, and some of them can be
optimized by simplifycfg without propsimplify's help.

I'd appreciate any review, comments, testing and bug reports. :)

Comment 26 Nick Lewycky 2006-08-07 19:28:46 PDT

Created attachment 378 [details]
fix KPcopy usage

Obvious stupid bug in the previous patch; reusing KPcopy means that properties
from one successor would be applied to a different one. Funny thing is, this
doesn't seem to change anything in llvm-test!

Comment 27 Nick Lewycky 2006-09-24 13:02:46 PDT

This optimization pass has since been enabled.