Overview
Much of the Julia numerical computing ecosystem is built on top of an assumption that model parameters are provided as an AbstractVector{<:Real}. This is a very general format that is agnostic towards the details of the model, and is also compatible with automatic differentiation (although it is worth pointing out that modern AD packages, such as Mooncake and Enzyme, are able to differentiate through more complex data structures).
This poses a challenge to DynamicPPL: how can we 'translate' between a model, e.g.,
using DynamicPPL, Distributions
@model function f()
x ~ Normal()
y ~ Beta(2, 2)
return nothing
endf (generic function with 2 methods)where we supply named parameters, e.g. using VarNamedTuple(x = 1.0, y = 0.5), and the vectorised format where parameters are unnamed, e.g. [1.0, 0.5]?
On top of that, it is often very useful to supply transformed parameters such that any vector in ℝⁿ is a valid input to the algorithm. We therefore also need a way to encode a particular transform strategy into this translation layer.
LogDensityFunction, conceptually
DynamicPPL provides a struct, LogDensityFunction, which acts as this translation layer. At its core, it is a wrapper around a model, a transform strategy, plus a mapping of VarNames to ranges. For example, in the above example, we could store the information as follows:
model = f()
transform_strategy = LinkAll()
ranges = @vnt begin
x := 1:1
y := 2:2
endVarNamedTuple
├─ x => 1:1
└─ y => 2:2Given all of this, plus a vector (say [3.0, 4.0]), we can figure out how to rerun our model:
- Since the transform strategy is
LinkAll(), we know that both values are transformed. (That is the transform strategy.) - We use the ranges to figure out that the transformed value of
xis[3.0], and the transformed value ofyis[4.0]. (This essentially specifies an initialisation strategy.) - We can then use any accumulators we like to obtain the information we are interested in.
The actual implementation of LogDensityFunction is a bit more complex than this, but is otherwise not too different.
One might ask why we store ranges like 1:1 instead of just the index 1 itself. A major reason is because ranges generalise better to multivariate and other distributions, and Bijectors.jl contains an interface for transforming vectorised parameters into raw values. Storing indices for univariate distributions would require special-casing them, and would also make type stability more difficult.
Creating a LogDensityFunction
The constructor for LogDensityFunction is
LogDensityFunction(model, logdensityfunc, vector_values; adtype)model is of course the model itself, but the other arguments deserve more explanation.
logdensityfuncis a function that takes anOnlyAccsVarInfoand returns some real number. For example, it could begetlogjoint_internal(most of the time that is what you will want!). This is the argument that makesLogDensityFunctionactually obey the interface that e.g. optimisers expect.vector_valuesis aVarNamedTuplethat containsVectorValues andLinkedVectorValues. DynamicPPL will automatically infer from this the transform strategy as well as the ranges for eachVarName.The easiest way to supply one is to use a
VectorValueAccumulator:accs = OnlyAccsVarInfo(VectorValueAccumulator()) _, accs = init!!(model, accs, InitFromPrior(), LinkAll()) vector_values = get_vector_values(accs) ldf = LogDensityFunction(model, getlogjoint_internal, vector_values)LogDensityFunction{Model{typeof(Main.f), (), (), (), Tuple{}, Tuple{}, DefaultContext, false}, Nothing, LinkAll, typeof(getlogjoint_internal), VarNamedTuple{(:x, :y), Tuple{DynamicPPL.RangeAndLinked{Tuple{}}, DynamicPPL.RangeAndLinked{Tuple{}}}}, Nothing, Vector{Float64}, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}}(Model{typeof(Main.f), (), (), (), Tuple{}, Tuple{}, DefaultContext, false}(Main.f, NamedTuple(), NamedTuple(), DefaultContext()), nothing, LinkAll(), DynamicPPL.getlogjoint_internal, VarNamedTuple(x = DynamicPPL.RangeAndLinked{Tuple{}}(1:1, true, ()), y = DynamicPPL.RangeAndLinked{Tuple{}}(2:2, true, ())), nothing, 2, DynamicPPL.AccumulatorTuple{3, @NamedTuple{LogPrior::LogPriorAccumulator{Float64}, LogJacobian::LogJacobianAccumulator{Float64}, LogLikelihood::LogLikelihoodAccumulator{Float64}}}((LogPrior = LogPriorAccumulator(0.0), LogJacobian = LogJacobianAccumulator(0.0), LogLikelihood = LogLikelihoodAccumulator(0.0))))If you need to also obtain gradients with respect to the vectorised parameters, you can pass
adtype::ADTypes.AbstractADTypeas a keyword argument. If gradients are not needed, you can omit this.
The LogDensityProblems.jl interface
Once you have constructed a LogDensityFunction, it will obey the LogDensityProblems.jl interface. The most important part of this is
using LogDensityProblems
LogDensityProblems.logdensity(ldf, [3.0, 4.0])-11.699778775647857We can verify quickly that this is what we expect by running the model manually:
using StatsFuns: logistic
x = 3.0
y = logistic(4.0)
params = VarNamedTuple(; x=x, y=y)
_, accs = init!!(model, OnlyAccsVarInfo(), InitFromParams(params), LinkAll())
accsOnlyAccsVarInfo
└─ AccumulatorTuple with 3 accumulators
├─ LogPrior => LogPriorAccumulator(-7.663478919812238)
├─ LogJacobian => LogJacobianAccumulator(4.03629985583562)
└─ LogLikelihood => LogLikelihoodAccumulator(0.0)and we see that the log-prior minus the log-Jacobian is indeed the same as what LogDensityProblems.logdensity returns.
If you created the LogDensityFunction with an adtype, you can also obtain gradients via
LogDensityProblems.logdensity_and_gradient(ldf, [3.0, 4.0])(although that is not demonstrated here since it requires an AD backend to be loaded).
Other functions such as LogDensityProblems.capabilities and LogDensityProblems.dimension will also work as expected with LogDensityFunction.
Is LogDensityFunction less powerful than model evaluation?
Given that the core purpose of LogDensityFunction is to evaluate a function mapping from vectors to log-densities, it might seem that it contains less information than the original model itself.
However, this is not true! As alluded to above, the LogDensityFunction is a convenient wrapper that knows how to generate an initialisation strategy, and also contains a transform strategy.
On the next page we'll see how we can access these ourselves and use them to re-evaluate the model with the vectorised parameters.