Leveraging Def/Use Analysis to Approximate Other Dataflow Analyses by David Marin (dmarin@cs.berkeley.edu) for CS 265 18 May 2003 Background Titanium is an explicitly parallel dialect of Java currently in development at UC Berkeley, designed to support high-performance computing. In this report, I detail my efforts to add global copy propagation to the Titanium compiler. For this project, Titanium's parallelism is almost completely tangential. I chose the Titanium compiler because, being at the same university, the development team was readily available to talk to. Also, it had enough optimizations that I would have examples to work from, but several basic optimizations that still needed to be implemented. This was the first time I had ever implemented any optimization in a compiler, and it was as much for my own edification as it was for the benefit of the Titanium project. Because global copy propagation is such an old and well-understood optimization, I will refrain from discussing related work and current research, and instead focus on what I learned, and how it might apply to future software engineering endeavors. Working Within Titanium's Existing Framework The Titanium compiler (which I will sometimes call just "Titanium") does not have a special intermediate language used for optimization. Instead, the optimization routines in the Titanium compiler run on code that looks substantially similar to the abstract syntax tree (AST) representing the original program. Instead, an ancillary data structure representing the control-flow graph for each function is created, with one node in the control-flow graph for every node in the AST. The Titanium compiler also has the ability to run a def/use analysis on any function. For any definition (e.g. an assignment) in a function, you can find all the uses (places where that definition might reach), and for every use in a function (e.g. a variable name), you can find every place where that variable's value may have come from. Titanium also has very rudimentary alias analysis; basically, it assumes that all pointers of the same type may be aliased. Titanium's def/use analysis works by attaching lists of uses and definitions directly to the corresponding AST node. The first problem for copy propagation is that Titanium's current analysis is a MAY analysis, and we need a MUST analysis. Recall that in copy propagation, our goal is to take a chunk of code like this (assume all variables in my examples are integers): a = b; ... x = a + 1; and turn it into this: a = b; ... x = b + 1; in the hope that the statement a = b might be eliminated entirely (and possibly the variable a as well). However, we can only do this if we know that a still equals b at the point in the program that we wish to do copy propagation. In practical terms, this means we need to know that neither a nor b has changed. That is, we need to verify that: a) The definition a = b MUST reach the point in the program where we wish to replace a with b. b) The value of b at a = b MUST still be available (unchanged) at that point. Fortunately, (a) can be determined using MAY analysis. As long as we can verify that a = b is the only definition that MAY reach x = a + 1, we know that it MUST do so (technically, we want to know that it MUST reach the actual use of a in x = a + 1, not just the statement). Otherwise some other definition of a would have reached x = a + 1 as well (this only works because a variable is considered "declared" whenever it is in scope; its declaration counts as a definition). In fact, this is exactly what the local copy propagation algorithm in the Titanium compiler did. The second, thornier problem is what to do about (b). We can't use the same approach for two reasons. First, the b in a = b isn't really a definition of b; it's just a use. If we could verify that the b in a = b was only defined in one place, and that definition was the only one that reached the x = b + 1, we would be safe, but this would exclude many (real) situations where multiple definitions of a reach a = b, for example: b = ... if (b < 0) b = 0; a = b; It may seem at first glance that we could simply take the set of all definitions of b in a = b, and compare it to the set of all definitions that reach the point in the program where we wish to do copy propagation, but this is unfortunately not the case. For example: 1 c = 1; 2 do { 3 d = c; 4 if (...) 5 c = c + 1; 6 } while(...); 7 ... d ... The definitions on lines 1 and 5 are the only definitions of c that can reach line 4, and likewise for line 7, but it is obviously unsafe to replace d with c on line 7. The second problem is more a technical shortcoming of Titanium's def/use analysis. Returning to our example above: a = b; ... x = a + 1; We wish to know if some unique definition of b reaches x = a + 1. The problem is that def information is attached directly to AST nodes that use that definition, and b does not appear in this line at all (the node for a will only have information about definitions of a). This information IS available at some point DURING the use/def analysis, but not when we actually need to use it. The straightforward approach would be to write a new analysis that gives us the information we want. It would be like Titanium's current def/use analysis, except: - It would be a must analysis (so merges would be intersections of sets of definitions rather than merges, and aliases would basically kill all relevant definitions). - You would be able to get information about any variable in scope at any point in the CFG. - Relevant data structures would be added (so you'd be able to call getMustDefs() on any node) However, Dan Bonachea (bonachea@cs.berkeley.edu), my contact in the Titanium group, cautioned me strongly against trying to implement another analysis (the original one took many man-months of effort, and I only had a few weeks). I suspect, had I tried to copy the current analysis and modify it, I would have run into the following problems: - Simply adding the new kinds of use/def information to AST nodes, would involve touching many parts of the code. - The actual code for the analysis would probably be more complex than my mental model of it; for instance, aliases may be handled at many different parts of the code, and there may be more than one type of alias information. - Integrating the new analysis into the existing code base would be time-consuming and dangerous. I would have to identify all points in the code that deal with the old analysis, and decide, what, if anything, I wish to do with respect to the new analysis. - For example, the function invalidateCFG() triggers a cleanup (deletion) of all the uselists in the "current" (defined by some global variable somewhere) function. - Every time an optimization modifies the AST that has already been analyzed, there is a decision about how (if at all) to modify the existing dataflow information in the AST, so that enough of it will be valid that we can continue with other optimizations (otherwise we'd have to re-run the analysis after every change). - The new analysis code might be difficult to debug, particularly because I would not be intimately familiar with the code it is derived from. Dan suggested a different approach, which I ultimately used. Instead of modifying the def/use analysis, I could treat the analysis as a black box, and instead make trivial modifications to the AST to make the def/use analysis give me the information I needed. So, for example, in this code: a = b; ... x = a + 1; I would insert two (fake) statements: 1 b = b; 2 a = b; 3 ... 4 newtemp = b; 5 x = a + 1; (where newtemp is a fresh variable) to give me a definition and use of b right where I needed them. Now, to verify that it is legal to replace the a on line 5 with b, we: - verify that the definition of a on line 2 is the only one that reaches line 5 (as before) - verify that the (fake) definition of b on line 1 is the only one that reaches the (fake) use of b on line 4 So, my algorithm for global copy propagation basically consists of: - For every function: - Run def/use analysis on the unmodified function - For every simple copy of a variable (x = y), identify the candidate locations to do copy propagation (i.e. the uses of x that this definition reaches), and store them. - For each candidate propagation, insert a fake definition and use of the right-hand side of the simple copy (i.e. y), and keep track of them in a data structure - Re-run def/use analysis. - For each candidate propagation, verify that is indeed possible to perform the propagation, and if it is, do so. - Remove all the fake uses and definitions There is one complication: the fake definitions can sometimes interfere with legitimate copy propagation. Consider the following code: 1 f = e; 2 g = e; 3 if (...) 4 g = ... 5 ...f... 6 ...g... The algorithm will insert fake definitions and uses in order to determine if it is legal to replace the f and g on lines 5 and 6 with e: 1 e = e; 2 f = e; 3 e = e; 4 g = e; 5 if (...) 6 g = ... 7 newtemp1 = e; 8 ...f... 9 newtemp2 = e; 10 ...g... To check if it is legal to replace the f on line 8 with e, we check that the fake definition on line 1 is the only definition of e that reaches the fake use on line 7. But... it isn't, because of the fake definition on line 3! If it were legal to perform copy propagation on g = e, this would not be such a problem (because the compiler keeps iterating through optimizations until they stop changing the tree), but in this case, we would never propagate f = e, even though it is legal to do so. To solve this, I implemented a defReaches() routine that effectively "looks through" any fake definitions; if it encounters a fake definition other than the one it expected, it verifies recursively that the definition we care about reaches its right hand side (ignoring any fake definitions it has already dealt with). Needing to implement defReaches() actually turned out to simplify my code somewhat, because now instead of introducing temporaries for fake uses (temp = x) I could just use a fake definition (x = x). Correctness and Performance Titanium has an extensive test suite, and in addition, I had another small test suite specifically designed to test copy propagation (including some tests I wrote myself). My version of the compiler passes all my test and the tests in Titanium's optimizer test suite. It also passes pretty much all the other tests, with the exception of: - compiler/tc-errors/bad_expr1.sh: (the test itself seems to be missing) - bounds/c.ti (which seems to be an off-by-one error in some other part of the code) Also, the test suite itself is set up to skip several tests (not exactly sure why, but didn't want to mess with it). I was also unable to run the parallel tests because I did not have the appropriate backends compiled. I did not have time to test the benefit of my optimization on actual Titanium code,.though I probably will in the near future. One thing I worry about is that my optimization might currently be too aggressive; if you do the propagation in our example: a = b; ... x = b + 1; (formerly a + 1) then the variable b is live longer than it used to be. If we can't actually remove a = b (there are uses of a that we can't propagate to), then it is possible that the "optimization" could actually degrade performance (for instance, by increasing register pressure). Titanium compiles down to C which is then compiled by gcc (which can also perform many common serial optimizations, including copy propagation), so it would be interesting to see how much benefit we lose if we don't perform optimizations in Titanium, and instead leave it to gcc. Probably the main benefit of including these optimizations directly in the Titanium compiler is their interaction with other, Titanium-specific optimizations. (For instance, copy propagation can be used to "clean up" after inlining, where one of the arguments to the inlined function was a local variable of the callee.) Future Work Probably the first thing that needs to be done is clean up and refactor the existing code. As it is, my code is designed to work, but parts of it aren't terribly well-structured (it's based off of partially modified bits of the local copy propagation code), and it performs a fair amount of useless computation (for example, it runs use/def analysis one more time than it needs to). Also, Jonathan Jamison (jjamison@cs.berkeley.edu) is doing common subexpression elimination using the same approach and probably some of the code I wrote; this code of course should only end up in one place in the final code base. My code also leaks more memory than it could (Titanium currently has a delete-it-if-you-can memory management policy, and I don't, even when I can). I'm also slightly worried that I may manipulate the AST in ways that are not totally safe. Basically it needs a good code review. Dan suggested that would probably be worthwhile to modify the code so that we will only propagate from a copy when we know that the copy can be eliminated, and probably not too difficult. Another related code transformation that Dan suggested is "scope pushing". In the following code: 1 int s; 2 { 3 int t; 4 ... 5 s = t; 6 } 7 ...s... We can't replace the s on line 7 with t because t is no longer in scope (this is a consequence of Titanium's AST-based approach; changes to the AST have to leave a legal program). The solution would be to move t's declaration outside of its enclosing block. In some cases this might increase register pressure (because now t would be live all the way back to its declaration, which, in Titanium, sets it to 0), so it might take some experimentation to tell if (and when) this optimization is useful. Finally, performance tests need to be run! Acknowledgements Thanks much to Titanium developer Dan Bonachea, who was extremely responsive and helpful throughout this project.