Initialisation strategies
In DynamicPPL, initialisation strategies are used to determine the parameters used to evaluate a model. Every time an assume tilde-statement is seen (i.e., a random variable), the initialisation strategy is used to generate a value for that variable.
One might perhaps more appropriately call them parameter generation strategies. Even the name initialisation is a bit of a historical misnomer (the original intent was that they would be used to populate an empty VarInfo with some values). However, over time, it has become clear that these are general enough to describe essentially any way of choosing parameters to evaluate a model with.
DynamicPPL provides four initialisation strategies out of the box. For many purposes you should be able to get away with only using these.
InitFromPrior: samples from the prior distribution.InitFromParams: reads from a set of given parameters. The parameters may be supplied in many different forms, but aVarNamedTupleis the most robust.InitFromUniform: samples from a uniform distribution in linked space.InitFromVector: reads from a set of vectorised parameters. This also needs aLogDensityFunctionto provide the necessary information about how the vectorised parameters map to the model variables.
DynamicPPL.InitFromPrior — Type
InitFromPrior()Obtain new values by sampling from the prior distribution.
DynamicPPL.InitFromParams — Type
InitFromParams(
params::Any,
fallback::Union{AbstractInitStrategy,Nothing}=InitFromPrior()
)Obtain new values by extracting them from the given set of params.
The most common use case is to provide a VarNamedTuple, which provides a mapping from variable names to values. However, we leave the type of params open in order to allow for custom parameter storage types.
DynamicPPL also provides extra implementations for params::NamedTuple and params::AbstractDict{<:VarName}, which both convert the input to a VarNamedTuple internally.
Custom parameter storage types
For InitFromParams to work correctly with a custom params::P, you need to implement
DynamicPPL.init(rng, vn::VarName, dist::Distribution, p::InitFromParams{P}) where {P}This tells you how to obtain values for the random variable vn from p.params. Note that the last argument is InitFromParams(params), not just params itself. Please see the docstring of DynamicPPL.init for more information on the expected behaviour.
In some cases (specifically, when you expect that the type of log-probabilities will need to be expanded: the most common example is when running AD with ForwardDiff.jl), you may also need to implement:
DynamicPPL.get_param_eltype(p::InitFromParams{P}) where {P}See the docstring of DynamicPPL.get_param_eltype for more information on when this is needed.
The argument fallback specifies how new values are to be obtained if they cannot be found in params, or they are specified as missing. fallback can either be an initialisation strategy itself, in which case it will be used to obtain new values, or it can be nothing, in which case an error will be thrown. The default for fallback is InitFromPrior().
DynamicPPL.InitFromParams — Method
InitFromParams(
ps::ParamsWithStats,
fallback::Union{Nothing,AbstractInitStrategy}=InitFromPrior()
)Initialise a model using the parameters stored in ps. The stats are ignored. fallback is used if the model requires the value of a parameter which is not present in ps.params.
DynamicPPL.InitFromUniform — Type
InitFromUniform()
InitFromUniform(lower, upper)Obtain new values by first transforming the distribution of the random variable to unconstrained space, then sampling a value uniformly between lower and upper.
If lower and upper are unspecified, they default to (-2, 2), which mimics Stan's default initialisation strategy (see the Stan reference manual page on initialisation for more details).
Requires that lower <= upper.
DynamicPPL.InitFromVector — Type
InitFromVector(
vect::AbstractVector{<:Real},
varname_ranges::VarNamedTuple,
transform_strategy::AbstractTransformStrategy
) <: AbstractInitStrategyThis constructor is only meant for internal use. Please use InitFromVector(vect, ldf::LogDensityFunction) instead, which will automatically construct the varname_ranges and transform_strategy arguments for you.
A struct that wraps a vector of parameter values, plus information about how random variables map to ranges in that vector.
The transform_strategy argument in fact duplicates information stored inside varname_ranges. For example, if every RangeAndTransform in varname_ranges has transform == DynamicLink(), then transform_strategy will be LinkAll().
However, storing transform_strategy here is a way to communicate at the type level whether all variables are linked or unlinked, which provides much better performance in the case where all variables are linked or unlinked, due to improved type stability.
DynamicPPL.InitFromVector — Method
InitFromVector(
vect::AbstractVector{<:Real},
ldf::LogDensityFunction
)Constructor for InitFromVector that extracts the necessary information about VarName ranges and transform strategy from a pre-existing LogDensityFunction.
DynamicPPL.InitFromVector — Method
DynamicPPL.InitFromVector(
mld::MarginalLogDensities.MarginalLogDensity{<:LogDensityFunctionWrapper},
unmarginalized_params::Union{AbstractVector,Nothing}=nothing,
)Create a new initialisation strategy using the parameter values stored in mld (and optionally unmarginalized_params).
If a Laplace approximation was used for the marginalisation, the values of the marginalized parameters are set to their mode (note that this only happens if the mld object has been used to compute the marginal log-density at least once, so that the mode has been computed).
If a vector of unmarginalized_params is specified, the values for the corresponding parameters will also be available as part of the initialisation strategy. This vector may be obtained e.g. by performing an optimization of the marginal log-density.
To use this initialisation strategy to obtain e.g. updated log-probabilities, you should re-evaluate the model with the values inside the returned VarInfo, for example using:
init_strategy = DynamicPPL.InitFromVector(mld, unmarginalized_params)
accs = DynamicPPL.OnlyAccsVarInfo((
DynamicPPL.LogPriorAccumulator(),
DynamicPPL.LogLikelihoodAccumulator(),
DynamicPPL.RawValueAccumulator(false),
# ... whatever else you need
))
_, accs = DynamicPPL.init!!(rng, model, oavi, init_strategy, DynamicPPL.UnlinkAll())You can then extract all the updated data from accs using DynamicPPL's existing API (see the DynamicPPL documentation for more details).
Example
julia> using DynamicPPL, Distributions, MarginalLogDensities
julia> @model function demo()
x ~ Normal()
y ~ Beta(2, 2)
end
demo (generic function with 2 methods)
julia> # Note that by default `marginalize` uses a linked VarInfo.
mld = marginalize(demo(), [@varname(x)]);
julia> using MarginalLogDensities: Optimization, OptimizationOptimJL
julia> # Find the mode of the marginal log-density of `y`, with an initial point of `y0`.
y0 = 2.0; opt_problem = Optimization.OptimizationProblem(mld, [y0])
OptimizationProblem. In-place: true
u0: 1-element Vector{Float64}:
2.0
julia> # This tells us the optimal (linked) value of `y` is around 0.
opt_solution = Optimization.solve(opt_problem, OptimizationOptimJL.NelderMead())
retcode: Success
u: 1-element Vector{Float64}:
4.88281250001733e-5
julia> # Get the an initialisation strategy representing the mode of `y`.
init_strategy = InitFromVector(mld, opt_solution.u);
julia> # Evaluate the model with this initialisation strategy.
accs = DynamicPPL.OnlyAccsVarInfo((
DynamicPPL.LogPriorAccumulator(),
DynamicPPL.LogLikelihoodAccumulator(),
DynamicPPL.RawValueAccumulator(false),
));
_, accs = DynamicPPL.init!!(demo(), accs, init_strategy, DynamicPPL.UnlinkAll());
julia> # Extract the raw (i.e. untransformed) values for all variables.
# `x` is set to its mode (which for `Normal()` is zero).
# Furthermore, `y` is set to the optimal value we found above.
vals = DynamicPPL.get_raw_values(accs)
VarNamedTuple
├─ x => 0.0
└─ y => 0.5000122070312476However, sometimes you will need to implement your own initialisation strategy. The subsequent sections will demonstrate how this can be done.
The required interface
Each initialisation strategy must subtype AbstractInitStrategy, and implement DynamicPPL.init(rng, vn, dist, strategy), which must return a TransformedValue.
DynamicPPL.AbstractInitStrategy — Type
AbstractInitStrategyAbstract type representing the possible ways of initialising new values for the random variables in a model (e.g., when creating a new VarInfo).
Any subtype of AbstractInitStrategy must implement the DynamicPPL.init method, and in some cases, DynamicPPL.get_param_eltype (see its docstring for details).
DynamicPPL.init — Function
init(rng::Random.AbstractRNG, vn::VarName, dist::Distribution, strategy::AbstractInitStrategy)Generate a new value for a random variable with the given distribution.
This function must return a TransformedValue.
If strategy provides values that are already untransformed (e.g., a Float64 within (0, 1) for dist::Beta, then you should return a TransformedValue with a NoTransform().
Otherwise, often there are cases where this will return a transformed value; for example, if the strategy is reading from an existing VarInfo.
DynamicPPL.TransformedValue — Type
TransformedValue{V,T<:AbstractTransform}A struct to represent a value that has undergone some transformation.
The transformed value is stored in the value field, and the inverse transformation is stored in the transform field.
That means that get_transform(tv)(get_internal_value(tv)) should return the raw, untransformed, value associated with tv.
An example
Consider the following model:
using DynamicPPL, Distributions, Random
@model function f()
x ~ Normal()
return x
end
model = f()Model{typeof(Main.f), (), (), (), Tuple{}, Tuple{}, DefaultContext, false}(Main.f, NamedTuple(), NamedTuple(), DefaultContext())Suppose we are writing a Metropolis–Hastings sampler, and we want to perform a random walk where the next proposed value of x depends on the previous value of x. Given a previous value x_prev we can define a custom initialisation strategy as follows:
struct InitRandomWalk <: DynamicPPL.AbstractInitStrategy
x_prev::Float64
step_size::Float64
end
function DynamicPPL.init(rng, vn::VarName, ::Distribution, strategy::InitRandomWalk)
new_x = rand(rng, Normal(strategy.x_prev, strategy.step_size))
# Insert some printing to see when this is called.
@info "init() is returning: $new_x"
return DynamicPPL.TransformedValue(new_x, DynamicPPL.NoTransform())
endGiven a previous value of x
x_prev = 4.0we can then make a proposal for x as follows:
new_x, new_vi = DynamicPPL.init!!(
model, VarInfo(), InitRandomWalk(x_prev, 0.5), UnlinkAll()
)[ Info: init() is returning: 4.002932817359369When evaluating the model, the value for x will be exactly that new value we proposed. We can see this from the return value:
new_x4.002932817359369Furthermore, we can read off the associated log-probability from the newly returned VarInfo:
DynamicPPL.getlogjoint(new_vi) ≈ logpdf(Normal(), new_x)trueFrom this log-probability, we can compute the acceptance ratio for the Metropolis–Hastings step, and thereby create a valid MCMC sampler.
In this case, we have defined an initialisation strategy that is random (and thus uses the rng argument for reproducibility). However, initialisation strategies can also be fully deterministic, in which case the rng argument is not needed. For example, DynamicPPL.InitFromParams reads from a set of parameters which are known ahead of time.
The returned TransformedValue
As mentioned above, the init function must return a TransformedValue. The transform stored inside this does not affect the result of the model evaluation, but it may have performance implications. In particular, the returned subtype does not determine whether the log-Jacobian term is accumulated or not: that is determined by a separate transform strategy.
What this means is that initialisation strategies should always choose the laziest possible version of TransformedValue, electing to do as few transformations as possible inside init.
For example, in the above example, we simply wrapped the untransformed value in TransformedValue(..., NoTransform()), which is the simplest possible choice. If a linked value is required by a later step inside tilde_assume!! (either the transformation or accumulation steps), it is the responsibility of that step to perform the linking.
Conversely, DynamicPPL.InitFromUniform samples inside linked space. Instead of performing the inverse link transform eagerly, it directly returns a TransformedValue(val, DynamicLink()), where val is already the linked vector: this means that if a linked value is required by a later step, it is not necessary to link it again. Even if no linked value is required, this lazy approach does not hurt performance, as it just defers the inverse linking to the later step.
In both cases, only one linking operation is performed (at most).