A primer to control flow analysis in e2immu
e2immu’s representation of statements in a method block is very traditional:
after parsing the code with JavaParser, the Resolver
’s doBlock
method converts JavaParser’s BlockStmt
into an e2immu Block
.
This Block
is fully resolved (all variable names and method references are fully known – JavaParser calls this phase symbol solving)
and now available in the MethodInspection
. The conversion code resides in ExpressionContext
and its helper classes, such as ParseMethodCallExpr
.
e2immu does not use JavaParser’s symbol solving, for the simple reason that I wanted to learn how to do so myself.
At Statement
level, the Java code is represented hierarchically: a Block
is a statement, and some statements like IfElseStatement
have
one or two sub-blocks. The condition in the latter is an Expression
, as expected.
Two points are worth mentioning. First, all statements and expressions are (level 2) immutable. At the end of the resolution phase, the
MethodInspectionImpl.Builder
builds an immutable MethodInspection
implementation, in effect ‘sealing’ the result of parsing the Java code.
Second, each statement has its components stored in a more abstract Structure
object, which allows the analyser to deal with statements in a
slightly more abstract way.
At the heart of control flow analysis is the StatementAnalyser
.
Each Statement
has its corresponding StatementAnalyser
, which is discarded at the end of the analysis phase.
However, the (eventually immutable) data in the companion StatementAnalysis
object remains.
Analysing Java code often means gathering information from all over the primary type: for example, to know whether a field’s value can be null will depend on an analysis of all methods where the field occurs. Knowing that a field’s value cannot be null influences the analysis of methods where that field occurs. So we expect an interplay between type analysers, method analysers, field analysers, and control flow inside statement analysers.
To this end, e2immu’s analysers, and the StatementAnalyser
in particular, make multiple passes, each time equipped
with more information about the components of the code.
Critically, once an analyser makes a decision, this decision cannot change anymore. Decisions can be delayed, however,
which means that an attempt to decide will be deferred to the next pass.
Decisions about the flow of statements is stored across StatementAnalysis
’s NavigationData
, StateData
, and FlowData
fields.
The first, NavigationData
, mimicks the original Java flow as stored in the Statement
s of the MethodInspection
.
It allows the statement analyser to navigate the code as is. It also contains the single replacement
field, which
can be used by other projects to make code replacements.
The second, StateData
, stores the evaluation result of the current statement’s expression, if available, and, equally important,
the current condition, state, precondition, and their cumulative counterparts.
The condition is the boolean expression containing all conditions that had to be met to reach this statement in the flow of the method.
The state is the boolean expression of knowledge of values caused by conditions and interrupts. It is either local or cumulative.
In the following code, state and condition have been annotated:
public static int method1(boolean a, boolean b) {
if (a && b) {
return 1; // condition: a && b; no state; cumulative: a && b
}
// no condition, the (cumulative) state is now !(a && b)
if (!a && !b) {
return 2; // condition and cumulative state: !a && !b
}
// state is now: !(a && b) && (!a && !b) === (a || b) && (!a || !b)
if (a && !b) return 3;
// state: !a && b
if (!a && b) return 4; // ERROR: conditional evaluates to constant
// state: false
int c = 0; // ERROR: unreachable statement
return 5;// unreachable statement, but no error anymore
}
Preconditions are essentially states and conditions caused by ’escapes’: we view exceptions as being outside of the normal control flow.
The results of control flow analysis are stored in FlowData
:
guaranteedToBeReachedInCurrentBlock
: a boolean (as always, wrapped inSetOnce
to ensure that decisions cannot be overwritten)guaranteedToBeReachedInMethod
: a booleanblockExecution
: one ofALWAYS
,CONDITIONAL
,NEVER
, orDELAYED
interruptsFlow
: are there any interrupts such as areturn
statement, abreak
orcontinue
, or an exception which is thrown?
Together, these decisions allow the statement analyser to return the lastStatement()
of a block – this is the statement where
each of the variables have their final value. When it differs from the NavigationData
, the analyser has performed dead code analysis.
They can, for example, also answer lastStatementIsEscape()
, which is used by the method analyser to compute the return value of a method.
The sub-block analysis part of the statement analyser uses blockExecution
to determine if there are branches of if-else
or switch
statements
that can never be reached, which would result in errors.
The interruptsFlow
information is critical to determine which statements (if-else
, assert
) actually represent preconditions for the methods:
conditions which causes a ‘fatal’ exception to be thrown if they are not met.
Let’s conclude this post by stating that no control flow analysis is possible without a solid boolean expression analysis.
In the example, the accumulation of boolean expressions has to be normalised, so that the state can be compared to the if-else
statement’s
condition. This obviously goes beyond boolean analysis; in particular, e2immu has quite a lot of logic to evaluate and simplify expressions
containing the inline conditional a ? b : c
. An example of such a simplification rule is
x && y ? (y && z ? a : b) : c --> x && y ? (z ? a : b) : c
which removes the y
component from the inner inline conditional’s condition.
Finally, note that this rule can only be applied when the expression y
is not a modifying expression, such as list.add(3)
!