LAST_READ edges for an AST graph

Posted July 1, 2021 by Bart Naudts ‐ 3 min read

How to add LAST_READ edges to the AST graph, using symbol-solved JavaParser or e2immu?

Question: How can I add LAST_READ edges to an abstract syntax tree (AST)? LAST_READ connects a node representing a usage of a variable to all nodes in the AST at which that variable’s value could have been last read from memory.

I will try to answer this question at the level of variables read in the different statements of a method, rather than at level of fields accessed in different methods.

Answer 1, using JavaParser’s symbol-solved AST or e2immu’s Inspection

The basic data structure that you want to work towards is a Variables object that keeps track of which variables are in use for every statement in the method. Each entry in the AST gets a Variables object; inside is a “AST variable symbol -> Variable" map. For each Variable, you can then record whether it was read or assigned in this statement. Traversing the AST will then allow you to add the LAST_READ edges by grabbing the data from the Variables object.

In the simplest form, you only keep track of creation, read and write events: a variable which is not read, written or created will not be present in the statement’s Variables map. This is sufficient to build the LAST_READ edges, but it may not be very convenient for a more in-depth analysis, because it’ll require a lot of searching and jumping around in the AST.

One rather obvious problem with this simplistic approach is that it does not keep track of indirect access: each time a variable is assigned to another variable, both need to be followed to track access to the underlying object, as shown in the following rather trivial example:

public static int length(String s) {
    String t = s;
    return t.length();
}

The LAST_READ will therefore be correct at variable level, rather than at object level.

Answer 2, using e2immu’s Analysis data

e2immu keeps track of every variable accessible (non-fields) or used up to now (fields), in every statement. Information is stored in an incremental way, so that the full state is always available without having to refer to earlier statements.

The main data structure is the map StamenteAnalysis.variables, which associates the variable’s FQN with a container which holds the variable’s state. This VariableInfoContainer keeps track of initial state (before evaluation of the main expression), evaluation state, and merged state (after summarizing changes to the variable in the sub-blocks). Each of the (up to) 3 VariableInfo state objects contains, among others, last read, last written, and current value information.

To build the LAST_READ edges, it therefore suffices to traverse the StatementAnalyser hierarchy (see previous post), loop over the variables in each statement, and keep track of READ events.

When traversing, a fair number of artificially generated variables will need to be discarded: e2immu generates, among others, “read copies” of variable fields (because their value may be different each time it gets read), “loop copies” of local variable defined outside a loop, but used inside one (because the assignments to these variables may depend on loop variables). The VariableNature interface describes each local variable’s purpose.

To keep track of the actual value of a variable, VariableInfo contains a set called staticallyAssignedVariables, which allows for the construction of an equivalence set of all variables with the same value.