Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perform substitution when equality is known #1179

Closed
nlewycky opened this issue Jun 10, 2006 · 28 comments
Closed

perform substitution when equality is known #1179

nlewycky opened this issue Jun 10, 2006 · 28 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla code-quality enhancement Improving things as opposed to bug fixing, e.g. new or missing feature

Comments

@nlewycky
Copy link
Contributor

Bugzilla Link 807
Resolution FIXED
Resolved on Feb 22, 2010 12:42
Version trunk
OS Linux
CC @lattner

Extended Description

The following code doesn't optimize down:

// Source: pag.csail.mit.edu/paste2005/slides/yang-paste2005-slides.ppt

extern void foo();

void process(int *p, int *q)
{
if (p != q)
return;

if (*p != *q)
foo(); // unreachable
}

In a branch whose condition depends on a EQ or NE, in the side where the
variables are equal (and if the types are equal (?)), replaceAllUsesOfWith(p,
q). This allows the compiler to optimize away the following test and should kill
the liveness of one of the variables. I suppose you could also do the same
optimization with a switch instruction.

Where would such an optimization belong? The instruction combiner?

@nlewycky
Copy link
Contributor Author

assigned to @nlewycky

@nlewycky
Copy link
Contributor Author

I've implemented my suggested change and it doesn't work. My change updates the
BB for the "then" block after setne to seqeq transformation but that doesn't get
through to the unified return block.

Modifying the source to read:
if (p != q)
return
else
q = p;

causes the example to optimize away as it should. Since that's correct in
general (subject to type of p and q and isVolatile) I'd like to do the
equivalent transformation on the SSA form. I'm not sure exactly how.

@lattner
Copy link
Collaborator

lattner commented Jun 11, 2006

It isn't safe to P->replaceAllUsesWith(Q) or vice versa because replaceAllUsesWith will replace ALL uses
with no notion of region. This will turn the initial condition into Q != Q, for example.

Implementing this requires a value range propagation pass. LLVM has two things related to this:

lib/Transforms/Scalar/CondPropagate.cpp, is a very simple conditional value propagor, and is stable.
Scalar/CorrelatedExprs.cpp is a more aggressive propagator but has SSA update bugs and generally
isn't stable. It should be replaced.

Unfortunately there isn't currently an easy place to add this to LLVM.

-Chris

@nlewycky
Copy link
Contributor Author

Adds new Propagate Equality pass
Ok, if it doesn't fit anywhere, I'll create a new place for it.

This finds all branch instructions (only, no switch yet) and finds the set of
basic blocks affected by:

  • using a depth-first iterator to find the children from the true and false
    sides
  • then set-subtract the false from the true.

This is written to handle multiple false roots for when switch support is
added. Unfortunately, this is probably horribly slow. Comments appreciated.

Anyways, it handles the test in the bug report correctly, and still passes the
regression tests, so I'd call it a win. This patch needs some minor cleanup
work (include the regtest, fix the name of the opt in the various places it
appears), but I'll do that after I get a review. I'll probably need to change
more than the cleanup I know about.

@nlewycky
Copy link
Contributor Author

Duh, that algorithm was incorrect and could replace too many uses, in an ugly
case involving a branch inside of a loop. Please wait for an update.

@nlewycky
Copy link
Contributor Author

New propagate equality pass.
There here does what we wanted before. As far as I know, there are no
outstanding issues. Comes complete with testcase.

@lattner
Copy link
Collaborator

lattner commented Jun 12, 2006

Nifty. Have you evaluated how many times this triggers on various programs?

I'd suggest setting up a test to see how often it applies, details here:
http://llvm.org/docs/TestingGuide.html#customtest

As a side note, this loop:
for (Instruction::op_iterator j = i->op_begin(); j != i->op_end(); ++j)
...
can be replaced with:

i->replaceUsesOfWith(Old, New);

Also, please s/LLVM Research Group/you/ in the file header. :)

-Chris

@nlewycky
Copy link
Contributor Author

New new propagate equality pass.
This patch has a different implementation; instead of checking each BB for
users of the value, check the value's use list for uses within the affected
area.

Stats:
2 propequal - Number of equal variables substituted
in MultiSource/Benchmarks/Olden/bh
7 propequal - Number of equal variables substituted
in MultiSource/Benchmarks/Prolangs-C/bison

@lattner
Copy link
Collaborator

lattner commented Jun 13, 2006

I realize that this probably isn't ready for review (recent bugs etc), but here is some feedback in the
meantime. Overall, the pass looks good and it does catch a useful property. However, I don't think we
can justify an entire pass over the IL for this simple optimization, at least not turned on by default.

The pass as you have it now can be checked into CVS whenever you're happy with it and it is relatively
bug free. Through inspection, I believe you have some issues updating PHI nodes, but I'll let you play
with the pass to decide if so and if there is anything else you'd like to change before the first
incarnation goes into CVS.

Before this gets turned on by default, I'd like to see the pass get a little bit more general. In particular,
this is a simple dominator-based optimization. I believe it would be more general and more efficient to
formulate this as a table-based dominator tree walk. Basically the pass would like like this:

  1. Keep a table of known (in)equalities, e.g. A == B, A != 0, C < D.
  2. Walk the code in dominator tree order. When you see a conditional branch where a successor is
    dominated by the branch block, see if there is a condition that can be inferred in that successor due to
    the branch condition. If so, enter it in the table before recursing into the successor part of the
    domtree. When unrecursing back to the branch block, pop the inequality off of the table.
  3. When walking instructions, keep track of equalities that can be inferred, and use the table to simplify
    instructions. Some examples are below.

Things that this approach can simplify:

X = load P <- from this point on, we know P is non-null, enter P != 0 into the table.
if (P == 0) goto ... <- obviously false

if (A != 4) { <- record A != 5 in the table
B = A+4; <- record B != 9 in the table
if (B == 9) { <- false

float A, B;
if (A > B) <- neither are nan's
if (isnan(A)) <- false

etc. There are a large number of simple transformations like this that we can build up with this
capability (look at the ill-fated cee pass for examples). Starting the framework and slowly expanding it
is probably the way to go, and the best way to do this is probably in CVS. In addition to new "tricks",
you can also make the table more powerful, e.g. to keep track of value ranges in addition to inequalities
(e.g. that A is [4 .. 9]), etc.

Are you interested in working on this? If so, the best approach is for you to get the initial version set
up with the rough framework, then iterate on it in CVS (you'll get write access). Seem reasonable?

-Chris

@lattner
Copy link
Collaborator

lattner commented Jun 13, 2006

Other more specific comments about your current pass:

  1. The LHS/RHS of a setcc instruction are guaranteed to be the same type.
  2. Your code should work with calls and casts, why do you special case and ignore them?
  3. Your code currently thinks about instructions and constants. I think it would be better to think about
    instructions and "everything else". In particular, arguments should be handled like constants.
  4. There is value in changing one argument to another:

int foo(int A, int B) {
if (A == B)
return A-B; // turn into A-A or B-B, which instcombine will then turn into 0.
return 42;
}

@nlewycky
Copy link
Contributor Author

Now passes llvm-test!
I should've warned you not to review, as my local copy diverged pretty far from
the original. Casts and calls aren't treated specially. I took your tip that
the two sides of a setcc are the same type.

There's not that much magical about handling constants. All it does it prevent
the pass from ripping out "uint 0" and replacing it with "tmp.22" because it
knows they're equal. Instead, we should replace an instruction with a constant.
Removing that will also cause it to break; it will rip the "uint 0"'s right out
of your getelementptr instructions.

This also means that it works fine on arguments. In fact, the included testcase
just uses function arguments.

Unfortunately, it doesn't seem to be much of a win. Zero replacements performed
across all of llvm-test.

@lattner
Copy link
Collaborator

lattner commented Jun 13, 2006

Looks reasonable, some more comments then :)

First, you do want to handle arguments somewhat like constants. For example, if you have Arg == 1,
you obviously want to replace Arg with 1. However, if you have inst == arg, you want to make sure you
replace the inst with the arg, not the arg with the inst.

This check:
if (!Forest->properlyDominates(BranchBB, BB)) continue;

can be done once for the entire expression. In particular, in conjunction with your other check, you're
really checking to see if the EqualDest is dominated by the branch block. Please change the code to
test to see if the EqualDest is properly dominated by the branch block up outside of the use list. If
not, don't process any uses.

Next, you should check phi nodes more aggressively. Consider this:

if (A == 7) {
...
} else {
...
}
C = phi(A, x)

here you can change the A in the PHI node with 7, because the use occurs in a dominating block, not
actually in the merge block. This is actually required for correctness in obscure cases too.

Finally, you should check to make sure you're not replacing FP values incorrectly. Specifically, if you
know that (A == B), then you can replace A with B in the true block. If you know "A != B" (FP-only) then
A is not neccesarily B in the false block (due to nans).

-Chris

@nlewycky
Copy link
Contributor Author

As for your more general system idea -- I actually came across this testcase
when looking for papers on how to write such a path-sensitive optimizer. Thus
far, I haven't found one I'm really happy with.

I'm very interested in working on such a pass, though I doubt I could write it
single-handedly. Getting the set of properties right (non-overlapping) is
important, or at least we'll have to have some sort of canonicalization step (a
== 4 implies a != 5). It'd probably work best if we fill entries in the record
lazily; so when the optimizer comes across a statement that could be optimized
if only property X, it works backwards over the program flow and fills in the
records to see if it holds.

I don't really want to go the way of value ranges. That's not expressive enough.
You can't express that a value is even, for example. Actually, I think that a
small expressive language would be the current LLVM instruction set.

And this is there point where I admit that I'm just not entirely sure how that
would work. It seems that I might want to canonicalize thigs down even farther.
Consider that there are (at least) three ways to make a value even: multiply by
2 (or a multiple), shift left by 1 (or more), and "and" away the last bit. All
three could be expressed with and/or operations and appropriate control flow,
then analyzed by a path-sensitive system with knowledge only of a bit at a time.
"mod" would also have to be rewritten this way.

Perhaps I should just start with something simple like EQ and NE, and save the
inventing for another time and place? I would happily implement the idea you
described, the way you described it.

@lattner
Copy link
Collaborator

lattner commented Jun 13, 2006

I'm very interested in working on such a pass, though I doubt I could write it
single-handedly.

I'd be happy to help and to guide.

I don't really want to go the way of value ranges. That's not expressive enough.
You can't express that a value is even, for example.

Optimization really isn't about catching every possibility for canonicalization and simplification, it's
about catching a lot in a reasonable amount of compile time. Having a single forward pass over the
dominator tree with a table represents a reasonable use of compile time: using a full theorum prover
(f.e.) probably isn't. :) In any case, I think that starting simple, with simple inequalities, is the way to
go. Once the table and basic structure is in place, it can be extended in many possible ways. :)

Perhaps I should just start with something simple like EQ and NE, and save the
inventing for another time and place? I would happily implement the idea you
described, the way you described it.

Sounds great,

-Chris

@lattner
Copy link
Collaborator

lattner commented Jun 13, 2006

Oh yeah, btw, instcombine knows that things like X*2&1 are always 0, it can track individual bits, inferring
when they are known zero or one.

-Chris

@nlewycky
Copy link
Contributor Author

First attempt at record-based PropagateEquality pass.
This is my first pass at trying to implement the system described by Chris. I'm
not too pleased with it, I don't see how it can handle things like "x == a; x
== b; a == b?", or even just "x == 0; x != 1?" without a design change of some
sort. But then, that's what review is for!

So far, it can handle this example:

extern void f();
extern void g();
void test1(int x, int y) {
if (x == 4) {
f();
if (y)
if (x == 4) { g(); }
}
}

and not much else.

@nlewycky
Copy link
Contributor Author

first functional record-based propequal pass
This pass manages 1366 variable replacements on llvm-test.

@nlewycky
Copy link
Contributor Author

I've encountered some problems that warrant design changes.

Firstly, it needs to become a two-pass system. The first pass tags the BBs, the
second pass performs the replacements. At the moment, we have to iterate through
every instruction in the program, not just the use lists of setcc instructions.

The reason is that at the time we see the setcc instruction, the properties
haven't yet been assigned to the BBs. That happens with the branch instruction.

Secondly, the EQProperties and NEProperties aren't canonicalized. Suppose you
create a property for "x == y", and a property for "x != z". Then, you have a
statement "if (x == z)": the first thing that happens is that x -> y, then we
check for "y != z?" which doesn't exist. (The opposite problem exists even if
you do variable substitution after.)

The EQProperty, needs to store the whole equivalence list, not just pairs.

Otherwise, it works rather well. It optimizes the original case that started
this bug.

@lattner
Copy link
Collaborator

lattner commented Jun 25, 2006

I haven't looked in depth at your code (about to do that now) but I don't think that a 2-pass system is
needed.

At the moment, we have to iterate through
every instruction in the program, not just the use lists of setcc instructions.

Right. This is actually a good thing, because it will be required for some future extensions. For
example, if you have this:

X = load P
C = seteq P, null

We can know that C == false. This case really does occur, for example when compiling the llvm
dyn_cast template. This has nothing to do with basic blocks, we we should keep iterating over
instructions.

-Chris

@lattner
Copy link
Collaborator

lattner commented Jun 25, 2006

Some feedback on the "first functional record-based propequal pass" pass:

You might want to name the pass "CorrelationPropagation.cpp" or something, as eventually we'll be
propagating more than just equality (e.g. x < 5 implies x < 7, and we could catch that in the future).

Please add a comment block at the top of the file explaining what the pass does, for example you can
include the example that started this PR.

Please remove any #includes you can. You don't need Instruction.h (implied by Instructions.h), don't
need Casting.h, Interval.h, Use.h, and perhaps others.

Instead of making Property an inheritance-based class, you could make property be a simple tuple (V1/
V2/property where property is an enum). Making it a simple value class would allow you to store it by-
value in the maps, allowing you to drop the new's.

Please don't make the multimaps be global. They should be instance vars in the class.

Why are you hashing the properties? With the above changes, PropertiesByHash could be implemented
as:
std::setKnownProperties;
Just implement operator< for properties to sort properties (in an arbitrary order) and the set will unique
them for you.

Right now your code appears to depend on the order of basic blocks in a function (you use visit
(Function), and attempts to keep track of properties per-basic-block. I'd suggest the code to be
modified to work like this pseudo-code:

runOnFunction() {
std::set KnownProperties;
return WalkDominatorTree(DT.getRoot(), KnownProperties);
}

bool WalkDominatorTree(DTNode *N, InputKnownProperties) {
// Copy the properties known so recorded properties only are inherited by successors in the domtree.
std::set KnownProperties = InputKnownProperties;

foreach (instruction i in N->getBasicBlock())
switch (i->getOpcode()) {
case XYZ:
visitXYZ(KnownProperties);
}

if (BBTerminator is branchinst &&
condition is X == Y &&
dominates successors) {
std::set TrueKnownProperties = KnownProperties;
TrueKnownProperties.insert(Property(X, Y, ==));
WalkDominatorTree(DT[TrueSucc], TrueKnownProperties);

  std::set FalseKnownProperties = KnownProperties;
  FalseKnownProperties.insert(Property(X, Y, !=));
  WalkDominatorTree(DT[FalseSucc], FalseKnownProperties);

} else {
foreach (dtsucc in N)
WalkDominatorTree(DT[FalseSucc], KnownProperties);
}
}

This way, the recursion on the stack naturally handles "pushing and popping" new properties, the visit
methods can add properties to KnownProperties if desired (i.e. visitLoad can put (P, null, !=) onto the
stack, and the visit methods can simplify conditions if desired (e.g. if you run across a setcc X != Y, and
you know about X,Y, you can turn it into true/false.

-Chris

@lattner
Copy link
Collaborator

lattner commented Jun 25, 2006

Oh yeah, PropagateEquality::newConstant is also already available in LLVM. You want:
ConstantExpr::get(Opcode, LHS, RHS);

Which does constant folding for you.

-Chris

@nlewycky
Copy link
Contributor Author

What if there is a BB with two predecessors, neither of whom dominate it, but
both happen to carry the same property?

It would mean doing a breadth-first traversal of the dominator tree, and each BB
takes the union of the properties of its predecessors. Is this too complicated?

@lattner
Copy link
Collaborator

lattner commented Jun 26, 2006

That boils down to being a question of compile time vs code quality trade off. In practice, I think these
things are unlikely to be significant, but if needed a full dataflow analysis could be done. I think sticking
to a dom-tree walk is likely to provide the biggest bang for the cycle though.

-Chris

@nlewycky
Copy link
Contributor Author

second attempt; includes load* ptr, ptr != null.
Thanks for the review. I was hoping to post a full patch with the new name, but
that will probably have to wait until next weekend. Otherwise, I've
incorporated almost all of your suggestions.

walkDominatorTree got refactored into oblivion. You'll find parts of it named
visitBasicBlock, proceedToSuccessor and the 5 visit functions.

There's quite a bit of redundancy between the "addProperty" and "resolve"
mechanisms. What "addProperty" does is given:

%A = seteq %x, 4
%B = setne %y, 0
%C = and bool %A, %B
br bool %C label %here, label %there

When the visitBranchInst calls addProperty, gives it Property(%C, true, ==).
addProperty then look inside of %C and see that for an 'and' instruction, that
Property(%A, true, ==) and Property(%B, true, ==). Then, those are lowered to
Property(%x, 4, ==) and Property(%y, 0, !=).

However, there are some cases this misses. "resolve" runs over every operand of
every instruction in every basic block, making it possibly slow, but can catch
cases like:

%A = seteq %x, 4
%B = setlt %x, 4
br bool %A label %here1, label %there
here1:
br bool %B label %here2, label %there

Because %B is defined outside of the block with the property, it can't be
simplified at definition time (through the visit* functions), nor is it
detected by addProperty above. This accounts for about 5% of all of the
simplifications done.

As for performance, my tests run it with -mem2reg -instcombine -propequal, and
I see 2121 total substitutions and 26 switch cases removed across llvm-test (in
llvm-gcc3, so remove tests using inline asm). Note that it double-counts
variable substitutions; %x -> %y -> %z counts as two. It runs as low as 1% and
as much as 20% of the optimization time, out of just those three passes.

@nlewycky
Copy link
Contributor Author

I'm considering a different design. I realized that the == property is different
from other properties in that it's used for the variable substitution. I'm
trying a new design which stores sets of synonymous variables together, and
properties are relationships are between two synonym-sets instead of two Values.
This means that if a != b and a == x, then b != x without needing to store a
second property.

Further, because checking for substitutes for variables is so common (once per
instruction's operand), I'll make an std::map to store the mapping from Value to
synonym-set's "most canonicalized form" of said Value.

As I'm still working on the implementation, it remains to be seen whether this
generates enough benefit on the CPU cycle.

@nlewycky
Copy link
Contributor Author

nlewycky commented Aug 8, 2006

collapse equivalent variables; use std::map for fast lookup
As is, this patch triggers an assertion, bug 871, for five of the tests in the
llvm-test suite.

The included feature tests aren't quite exhaustive, and some of them can be
optimized by simplifycfg without propsimplify's help.

I'd appreciate any review, comments, testing and bug reports. :)

@nlewycky
Copy link
Contributor Author

nlewycky commented Aug 8, 2006

fix KPcopy usage
Obvious stupid bug in the previous patch; reusing KPcopy means that properties
from one successor would be applied to a different one. Funny thing is, this
doesn't seem to change anything in llvm-test!

@nlewycky
Copy link
Contributor Author

This optimization pass has since been enabled.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
@Endilll Endilll added enhancement Improving things as opposed to bug fixing, e.g. new or missing feature and removed new-feature labels Aug 15, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla code-quality enhancement Improving things as opposed to bug fixing, e.g. new or missing feature
Projects
None yet
Development

No branches or pull requests

3 participants