Initialisation strategies
In DynamicPPL, initialisation strategies are used to determine the parameters used to evaluate a model.
One might perhaps more appropriately call them parameter generation strategies. Even the name initialisation is a bit of a historical misnomer (the original intent was that they would be used to populate an empty VarInfo with some values). However, over time, it has become clear that these are general enough to describe essentially any way of choosing parameters to evaluate a model with.
Currently, initialisation strategies are stored inside a model: specifically, if a model's context field is an InitContext, that context will contain a DynamicPPL.AbstractInitStrategy.
Every time an assume tilde-statement is seen (i.e., a random variable), the initialisation strategy is used to generate a value for that variable.
DynamicPPL.AbstractInitStrategy — Type
AbstractInitStrategyAbstract type representing the possible ways of initialising new values for the random variables in a model (e.g., when creating a new VarInfo).
Any subtype of AbstractInitStrategy must implement the DynamicPPL.init method, and in some cases, DynamicPPL.get_param_eltype (see its docstring for details).
Each initialisation strategy must implement DynamicPPL.init(rng, vn, dist, strategy), which must return an AbstractTransformedValue.
An example
Consider the following model:
using DynamicPPL, Distributions, Random
@model function f()
x ~ Normal()
return x
end
model = f()Model{typeof(Main.f), (), (), (), Tuple{}, Tuple{}, DefaultContext, false}(Main.f, NamedTuple(), NamedTuple(), DefaultContext())Suppose we are writing a Metropolis–Hastings sampler, and we want to perform a random walk where the next proposed value of x depends on the previous value of x. Given a previous value x_prev we can define a custom initialisation strategy as follows:
struct InitRandomWalk <: DynamicPPL.AbstractInitStrategy
x_prev::Float64
step_size::Float64
end
function DynamicPPL.init(rng, vn::VarName, ::Distribution, strategy::InitRandomWalk)
new_x = rand(rng, Normal(strategy.x_prev, strategy.step_size))
# Insert some printing to see when this is called.
@info "init() is returning: $new_x"
return DynamicPPL.UntransformedValue(new_x)
endGiven a previous value of x
x_prev = 4.0we can then make a proposal for x as follows:
new_x, new_vi = DynamicPPL.init!!(
model, VarInfo(), InitRandomWalk(x_prev, 0.5), UnlinkAll()
)[ Info: init() is returning: 3.802712008190864When evaluating the model, the value for x will be exactly that new value we proposed. We can see this from the return value:
new_x3.802712008190864Furthermore, we can read off the associated log-probability from the newly returned VarInfo:
DynamicPPL.getlogjoint(new_vi) ≈ logpdf(Normal(), new_x)true(From this log-probability, we can compute the acceptance ratio for the Metropolis–Hastings step.)
In this case, we have defined an initialisation strategy that is random (and thus uses the rng argument for reproducibility). However, initialisation strategies can also be fully deterministic, in which case the rng argument is not needed. For example, DynamicPPL.InitFromParams reads from a set of given parameters.
The returned AbstractTransformedValue
As mentioned above, the init function must return an AbstractTransformedValue. The subtype of AbstractTransformedValue used does not affect the result of the model evaluation, but it may have performance implications. In particular, the returned subtype does not determine whether the log-Jacobian term is accumulated or not: that is determined by a separate transform strategy.
What this means is that initialisation strategies should always choose the laziest possible subtype of AbstractTransformedValue.
For example, in the above example, we used UntransformedValue, which is the simplest possible choice. If a linked value is required by a later step inside tilde_assume!!, it is the responsibility of that step to perform the linking.
Conversely, DynamicPPL.InitFromUniform samples inside linked space. Instead of performing the inverse link transform and returning an UntransformedValue, it directly returns a LinkedVectorValue: this means that if a linked value is required by a later step, it is not necessary to link it again. Even if no linked value is required, this lazy approach does not hurt performance, as it just defers the inverse linking to the later step.
In both cases, only one linking operation is performed (at most).