Initialisation strategies

In DynamicPPL, initialisation strategies are used to determine the parameters used to evaluate a model. Every time an assume tilde-statement is seen (i.e., a random variable), the initialisation strategy is used to generate a value for that variable.

Note

One might perhaps more appropriately call them parameter generation strategies. Even the name initialisation is a bit of a historical misnomer (the original intent was that they would be used to populate an empty VarInfo with some values). However, over time, it has become clear that these are general enough to describe essentially any way of choosing parameters to evaluate a model with.

DynamicPPL provides four initialisation strategies out of the box. For many purposes you should be able to get away with only using these.

  • InitFromPrior: samples from the prior distribution.
  • InitFromParams: reads from a set of given parameters. The parameters may be supplied in many different forms, but a VarNamedTuple is the most robust.
  • InitFromUniform: samples from a uniform distribution in linked space.
  • InitFromVector: reads from a set of vectorised parameters. This also needs a LogDensityFunction to provide the necessary information about how the vectorised parameters map to the model variables.
DynamicPPL.InitFromParamsType
InitFromParams(
    params::Any
    fallback::Union{AbstractInitStrategy,Nothing}=InitFromPrior()
)

Obtain new values by extracting them from the given set of params.

The most common use case is to provide a VarNamedTuple, which provides a mapping from variable names to values. However, we leave the type of params open in order to allow for custom parameter storage types.

Custom parameter storage types

For InitFromParams to work correctly with a custom params::P, you need to implement

DynamicPPL.init(rng, vn::VarName, dist::Distribution, p::InitFromParams{P}) where {P}

This tells you how to obtain values for the random variable vn from p.params. Note that the last argument is InitFromParams(params), not just params itself. Please see the docstring of DynamicPPL.init for more information on the expected behaviour.

In some cases (specifically, when you expect that the type of log-probabilities will need to be expanded: the most common example is when running AD with ForwardDiff.jl), you may also need to implement:

DynamicPPL.get_param_eltype(p::InitFromParams{P}) where {P}

See the docstring of DynamicPPL.get_param_eltype for more information on when this is needed.

The argument fallback specifies how new values are to be obtained if they cannot be found in params, or they are specified as missing. fallback can either be an initialisation strategy itself, in which case it will be used to obtain new values, or it can be nothing, in which case an error will be thrown. The default for fallback is InitFromPrior().

source
DynamicPPL.InitFromUniformType
InitFromUniform()
InitFromUniform(lower, upper)

Obtain new values by first transforming the distribution of the random variable to unconstrained space, then sampling a value uniformly between lower and upper.

If lower and upper are unspecified, they default to (-2, 2), which mimics Stan's default initialisation strategy (see the Stan reference manual page on initialisation for more details).

Requires that lower <= upper.

source
DynamicPPL.InitFromVectorType
InitFromVector(
    vect::AbstractVector{<:Real},
    varname_ranges::VarNamedTuple,
    transform_strategy::AbstractTransformStrategy
) <: AbstractInitStrategy
Warning

This constructor is only meant for internal use. Please use InitFromVector(vect, ldf::LogDensityFunction) instead, which will automatically construct the varname_ranges and transform_strategy arguments for you.

A struct that wraps a vector of parameter values, plus information about how random variables map to ranges in that vector.

The transform_strategy argument in fact duplicates information stored inside varname_ranges. For example, if every RangeAndLinked in varname_ranges has is_linked == true, then transform_strategy will be LinkAll().

However, storing transform_strategy here is a way to communicate at the type level whether all variables are linked or unlinked, which provides much better performance in the case where all variables are linked or unlinked, due to improved type stability.

source

However, sometimes you will need to implement your own initialisation strategy. The subsequent sections will demonstrate how this can be done.

The required interface

Each initialisation strategy must subtype AbstractInitStrategy, and implement DynamicPPL.init(rng, vn, dist, strategy), which must return an AbstractTransformedValue.

DynamicPPL.initFunction
init(rng::Random.AbstractRNG, vn::VarName, dist::Distribution, strategy::AbstractInitStrategy)

Generate a new value for a random variable with the given distribution.

This function must return an AbstractTransformedValue.

If strategy provides values that are already untransformed (e.g., a Float64 within (0, 1) for dist::Beta, then you should return an UntransformedValue.

Otherwise, often there are cases where this will return either a VectorValue or a LinkedVectorValue, for example, if the strategy is reading from an existing VarInfo.

source
DynamicPPL.AbstractTransformedValueType
AbstractTransformedValue

An abstract type for values that enter the DynamicPPL tilde-pipeline.

These values are generated by an AbstractInitStrategy: the function DynamicPPL.init should return an AbstractTransformedValue.

Each AbstractTransformedValue contains some version of the actual variable's value, together with a transformation that can be used to convert the internal value back to the original space.

Current subtypes are VectorValue, LinkedVectorValue, and UntransformedValue. DynamicPPL's VarInfo type stores either VectorValues or LinkedVectorValues internally, depending on the link status of the VarInfo.

Warning

Even though the subtypes listed above are public, this abstract type is not itself part of the public API and should not be subtyped by end users. Much of DynamicPPL's model evaluation methods depends on these subtypes having predictable behaviour, i.e., their transforms should always be from_linked_vec_transform(dist), from_vec_transform(dist), or their inverse. If you create a new subtype of AbstractTransformedValue and use it, DynamicPPL will not know how to handle it and may either error or silently give incorrect results.

In principle, it should be possible to subtype this and allow for custom transformations to be used (not just the 'default' ones). However, this is not currently implemented.

Subtypes of this should implement the following functions:

  • DynamicPPL.get_transform(tv::AbstractTransformedValue): Get the transformation that converts the internal value back to the original space.

  • DynamicPPL.get_internal_value(tv::AbstractTransformedValue): Get the internal value stored in tv.

  • DynamicPPL.set_internal_value(tv::AbstractTransformedValue, new_val): Create a new AbstractTransformedValue with the same transformation as tv, but with internal value new_val.

source

An example

Consider the following model:

using DynamicPPL, Distributions, Random

@model function f()
    x ~ Normal()
    return x
end
model = f()
Model{typeof(Main.f), (), (), (), Tuple{}, Tuple{}, DefaultContext, false}(Main.f, NamedTuple(), NamedTuple(), DefaultContext())

Suppose we are writing a Metropolis–Hastings sampler, and we want to perform a random walk where the next proposed value of x depends on the previous value of x. Given a previous value x_prev we can define a custom initialisation strategy as follows:

struct InitRandomWalk <: DynamicPPL.AbstractInitStrategy
    x_prev::Float64
    step_size::Float64
end

function DynamicPPL.init(rng, vn::VarName, ::Distribution, strategy::InitRandomWalk)
    new_x = rand(rng, Normal(strategy.x_prev, strategy.step_size))
    # Insert some printing to see when this is called.
    @info "init() is returning: $new_x"
    return DynamicPPL.UntransformedValue(new_x)
end

Given a previous value of x

x_prev = 4.0

we can then make a proposal for x as follows:

new_x, new_vi = DynamicPPL.init!!(
    model, VarInfo(), InitRandomWalk(x_prev, 0.5), UnlinkAll()
)
[ Info: init() is returning: 4.274157475606392

When evaluating the model, the value for x will be exactly that new value we proposed. We can see this from the return value:

new_x
4.274157475606392

Furthermore, we can read off the associated log-probability from the newly returned VarInfo:

DynamicPPL.getlogjoint(new_vi) ≈ logpdf(Normal(), new_x)
true

From this log-probability, we can compute the acceptance ratio for the Metropolis–Hastings step, and thereby create a valid MCMC sampler.

In this case, we have defined an initialisation strategy that is random (and thus uses the rng argument for reproducibility). However, initialisation strategies can also be fully deterministic, in which case the rng argument is not needed. For example, DynamicPPL.InitFromParams reads from a set of parameters which are known ahead of time.

The returned AbstractTransformedValue

As mentioned above, the init function must return an AbstractTransformedValue. The subtype of AbstractTransformedValue used does not affect the result of the model evaluation, but it may have performance implications. In particular, the returned subtype does not determine whether the log-Jacobian term is accumulated or not: that is determined by a separate transform strategy.

What this means is that initialisation strategies should always choose the laziest possible subtype of AbstractTransformedValue.

For example, in the above example, we used UntransformedValue, which is the simplest possible choice. If a linked value is required by a later step inside tilde_assume!! (either the transformation or accumulation steps), it is the responsibility of that step to perform the linking.

Conversely, DynamicPPL.InitFromUniform samples inside linked space. Instead of performing the inverse link transform and returning an UntransformedValue, it directly returns a LinkedVectorValue: this means that if a linked value is required by a later step, it is not necessary to link it again. Even if no linked value is required, this lazy approach does not hurt performance, as it just defers the inverse linking to the later step.

In both cases, only one linking operation is performed (at most).