Leveraging Def/Use Analysis to Approximate Other Dataflow Analyses
by David Marin (dmarin@cs.berkeley.edu) for CS 265
18 May 2003


Background

Titanium is an explicitly parallel dialect of Java currently in
development at UC Berkeley, designed to support high-performance
computing. In this report, I detail my efforts to add global copy
propagation to the Titanium compiler.

For this project, Titanium's parallelism is almost completely
tangential. I chose the Titanium compiler because, being at the same
university, the development team was readily available to talk
to. Also, it had enough optimizations that I would have examples to
work from, but several basic optimizations that still needed to be
implemented.

This was the first time I had ever implemented any optimization in a
compiler, and it was as much for my own edification as it was for the
benefit of the Titanium project. Because global copy propagation is
such an old and well-understood optimization, I will refrain from
discussing related work and current research, and instead focus on
what I learned, and how it might apply to future software engineering
endeavors. 


Working Within Titanium's Existing Framework

The Titanium compiler (which I will sometimes call just "Titanium")
does not have a special intermediate language used for
optimization. Instead, the optimization routines in the Titanium
compiler run on code that looks substantially similar to the abstract
syntax tree (AST) representing the original program. Instead, an
ancillary data structure representing the control-flow graph for each
function is created, with one node in the control-flow graph for every
node in the AST. 

The Titanium compiler also has the ability to run a def/use analysis
on any function. For any definition (e.g. an assignment) in a
function, you can find all the uses (places where that definition
might reach), and for every use in a function (e.g. a variable name),
you can find every place where that variable's value may have come
from. Titanium also has very rudimentary alias analysis; basically, it
assumes that all pointers of the same type may be aliased. Titanium's
def/use analysis works by attaching lists of uses and definitions
directly to the corresponding AST node. 

The first problem for copy propagation is that Titanium's current
analysis is a MAY analysis, and we need a MUST analysis. Recall that
in copy propagation, our goal is to take a chunk of code like this
(assume all variables in my examples are integers): 

a = b;
...
x = a + 1;

and turn it into this:

a = b;
...
x = b + 1;

in the hope that the statement a = b might be eliminated entirely (and
possibly the variable a as well). However, we can only do this if we
know that a still equals b at the point in the program that we wish to
do copy propagation. In practical terms, this means we need to know
that neither a nor b has changed. That is, we need to verify that: 

a) The definition a = b MUST reach the point in the program where we
   wish to replace a with b. 
b) The value of b at a = b MUST still be available (unchanged) at that
   point. 

Fortunately, (a) can be determined using MAY analysis. As long as we
can verify that a = b is the only definition that MAY reach x = a + 1,
we know that it MUST do so (technically, we want to know that it MUST
reach the actual use of a in x = a + 1, not just the
statement). Otherwise some other definition of a would have reached x
= a + 1 as well (this only works because a variable is considered
"declared" whenever it is in scope; its declaration counts as a
definition). In fact, this is exactly what the local copy propagation
algorithm in the Titanium compiler did. 

The second, thornier problem is what to do about (b). We can't use the
same approach for two reasons. 

First, the b in a = b isn't really a definition of b; it's just a
use. If we could verify that the b in a = b was only defined in one
place, and that definition was the only one that reached the x = b +
1, we would be safe, but this would exclude many (real) situations
where multiple definitions of a reach a = b, for example: 

b = ...
if (b < 0)
  b = 0;
a = b;

It may seem at first glance that we could simply take the set of all
definitions of b in a = b, and compare it to the set of all
definitions that reach the point in the program where we wish to do
copy propagation, but this is unfortunately not the case. For example: 

1  c = 1;
2  do {
3    d = c;
4    if (...)
5      c = c + 1;
6  } while(...);
7  ... d ...

The definitions on lines 1 and 5 are the only definitions of c that
can reach line 4, and likewise for line 7, but it is obviously unsafe
to replace d with c on line 7. 

The second problem is more a technical shortcoming of Titanium's
def/use analysis. Returning to our example above: 

a = b;
...
x = a + 1;

We wish to know if some unique definition of b reaches x = a + 1. The
problem is that def information is attached directly to AST nodes that
use that definition, and b does not appear in this line at all (the
node for a will only have information about definitions of a). This
information IS available at some point DURING the use/def analysis,
but not when we actually need to use it. 

The straightforward approach would be to write a new analysis that
gives us the information we want. It would be like Titanium's current
def/use analysis, except: 

- It would be a must analysis (so merges would be intersections of
  sets of definitions rather than merges, and aliases would basically
  kill all relevant definitions).
- You would be able to get information about any variable in scope at
  any point in the CFG.
- Relevant data structures would be added (so you'd be able to call
  getMustDefs(<variable name>) on any node)

However, Dan Bonachea (bonachea@cs.berkeley.edu), my contact in the
Titanium group, cautioned me strongly against trying to implement
another analysis (the original one took many man-months of effort, and
I only had a few weeks). I suspect, had I tried to copy the current
analysis and modify it, I would have run into the following problems: 

- Simply adding the new kinds of use/def information to AST nodes,
  would involve touching many parts of the code. 
- The actual code for the analysis would probably be more complex than
  my mental model of it; for instance, aliases may be handled at many
  different parts of the code, and there may be more than one type of
  alias information. 
- Integrating the new analysis into the existing code base would be
  time-consuming and dangerous. I would have to identify all points in
  the code that deal with the old analysis, and decide, what, if
  anything, I wish to do with respect to the new analysis. 
  - For example, the function invalidateCFG() triggers a cleanup
    (deletion) of all the uselists in the "current" (defined by some
    global variable somewhere) function. 
  - Every time an optimization modifies the AST that has already been
    analyzed, there is a decision about how (if at all) to modify the
    existing dataflow information in the AST, so that enough of it will
    be valid that we can continue with other optimizations (otherwise
    we'd have to re-run the analysis after every change). 
- The new analysis code might be difficult to debug, particularly
  because I would not be intimately familiar with the code it is
  derived from. 

Dan suggested a different approach, which I ultimately used. Instead
of modifying the def/use analysis, I could treat the analysis as a
black box, and instead make trivial modifications to the AST to make
the def/use analysis give me the information I needed. So, for
example, in this code: 

a = b;
...
x = a + 1;

I would insert two (fake) statements:

1  b = b;
2  a = b;
3   ...
4  newtemp = b;
5  x = a + 1;

(where newtemp is a fresh variable)

to give me a definition and use of b right where I needed them. Now,
to verify that it is legal to replace the a on line 5 with b, we: 

- verify that the definition of a on line 2 is the only one that
  reaches line 5 (as before) 
- verify that the (fake) definition of b on line 1 is the only one
  that reaches the (fake) use of b on line 4 

So, my algorithm for global copy propagation basically consists of:
- For every function:
  - Run def/use analysis on the unmodified function
  - For every simple copy of a variable (x = y), identify the
    candidate locations to do copy propagation (i.e. the uses of x that
    this definition reaches), and store them. 
  - For each candidate propagation, insert a fake definition and use
    of the right-hand side of the simple copy (i.e. y), and keep track
    of them in a data structure 
  - Re-run def/use analysis.
  - For each candidate propagation, verify that is indeed possible to
    perform the propagation, and if it is, do so. 
  - Remove all the fake uses and definitions
  
There is one complication: the fake definitions can sometimes
interfere with legitimate copy propagation. Consider the following
code: 

1  f = e;
2  g = e;
3  if (...)
4    g = ...
5  ...f...
6  ...g... 

The algorithm will insert fake definitions and uses in order to
determine if it is legal to replace the f and g on lines 5 and 6 with
e: 
 
1  e = e;
2  f = e;
3  e = e;
4  g = e;
5  if (...)
6    g = ...
7  newtemp1 = e;
8  ...f...
9  newtemp2 = e;
10 ...g... 

To check if it is legal to replace the f on line 8 with e, we check
that the fake definition on line 1 is the only definition of e that
reaches the fake use on line 7. But... it isn't, because of the fake
definition on line 3! If it were legal to perform copy propagation on
g = e, this would not be such a problem (because the compiler keeps
iterating through optimizations until they stop changing the tree),
but in this case, we would never propagate f = e, even though it is
legal to do so. 

To solve this, I implemented a defReaches() routine that effectively
"looks through" any fake definitions; if it encounters a fake
definition other than the one it expected, it verifies recursively
that the definition we care about reaches its right hand side
(ignoring any fake definitions it has already dealt with). 

Needing to implement defReaches() actually turned out to simplify my
code somewhat, because now instead of introducing temporaries for fake
uses (temp = x) I could just use a fake definition (x = x). 


Correctness and Performance

Titanium has an extensive test suite, and in addition, I had another
small test suite specifically designed to test copy propagation
(including some tests I wrote myself). My version of the compiler
passes all my test and the tests in Titanium's optimizer test
suite. It also passes pretty much all the other tests, with the
exception of: 
- compiler/tc-errors/bad_expr1.sh: (the test itself seems to be missing)
- bounds/c.ti (which seems to be an off-by-one error in some other
  part of the code) 

Also, the test suite itself is set up to skip several tests (not
exactly sure why, but didn't want to mess with it). I was also unable
to run the parallel tests because I did not have the appropriate
backends compiled. 

I did not have time to test the benefit of my optimization on actual
Titanium code,.though I probably will in the near future. One thing I
worry about is that my optimization might currently be too aggressive;
if you do the propagation in our example: 

a = b;
...
x = b + 1; (formerly a + 1)

then the variable b is live longer than it used to be. If we can't
actually remove a = b (there are uses of a that we can't propagate
to), then it is possible that the "optimization" could actually
degrade performance (for instance, by increasing register pressure). 

Titanium compiles down to C which is then compiled by gcc (which can
also perform many common serial optimizations, including copy
propagation), so it would be interesting to see how much benefit we
lose if we don't perform optimizations in Titanium, and instead leave
it to gcc. Probably the main benefit of including these optimizations
directly in the Titanium compiler is their interaction with other,
Titanium-specific optimizations. (For instance, copy propagation can
be used to "clean up" after inlining, where one of the arguments to
the inlined function was a local variable of the callee.) 


Future Work

Probably the first thing that needs to be done is clean up and
refactor the existing code. As it is, my code is designed to work, but
parts of it aren't terribly well-structured (it's based off of
partially modified bits of the local copy propagation code), and it
performs a fair amount of useless computation (for example, it runs
use/def analysis one more time than it needs to). Also, Jonathan
Jamison (jjamison@cs.berkeley.edu) is doing common subexpression
elimination using the same approach and probably some of the code I
wrote; this code of course should only end up in one place in the
final code base. My code also leaks more memory than it could
(Titanium currently has a delete-it-if-you-can memory management
policy, and I don't, even when I can). I'm also slightly worried that
I may manipulate the AST in ways that are not totally safe. Basically
it needs a good code review. 

Dan suggested that would probably be worthwhile to modify the code so
that we will only propagate from a copy when we know that the copy can
be eliminated, and probably not too difficult. 

Another related code transformation that Dan suggested is "scope
pushing". In the following code: 

1  int s;
2  {
3    int t;
4    ...
5    s = t;
6  }
7  ...s...

We can't replace the s on line 7 with t because t is no longer in
scope (this is a consequence of Titanium's AST-based approach; changes
to the AST have to leave a legal program). The solution would be to
move t's declaration outside of its enclosing block. In some cases
this might increase register pressure (because now t would be live all
the way back to its declaration, which, in Titanium, sets it to 0), so
it might take some experimentation to tell if (and when) this
optimization is useful. 

Finally, performance tests need to be run!


Acknowledgements

Thanks much to Titanium developer Dan Bonachea, who was extremely
responsive and helpful throughout this project.