Vectorisation of Turing models

The ‘native’ data structure for a Turing model is VarNamedTuple, which is a mapping from VarNames to values. For example, rand(model) returns a VarNamedTuple:

using Turing
Turing.setprogress!(false)

@model function demo()
    x ~ Normal(0, 1)
    y ~ Normal(x, 1)
end
model = demo()

vnt = rand(model)

[ Info: [Turing]: progress logging is disabled globally

VarNamedTuple
├─ x => -0.12924004308483067
└─ y => -0.28406805321037926

and you can likewise use a VarNamedTuple to specify initial parameters to Turing’s inference algorithms:

sample(model, MH(), 100; initial_params=InitFromParams(vnt));

Now, VarNamedTuple is a rich data structure which faithfully retains names and types according to the structure of the model, which is why it is our default data structure.

However, many packages in the Julia ecosystem work with array inputs and outputs. Indeed this is even the case for many of Turing’s dependencies, such as AdvancedHMC, Optimization, and AdvancedVI: in these cases, Turing is responsible for performing the ‘translation’ between the VarNamedTuple and the vectorised format that these packages require.

This page describes how to convert a Turing model into a vectorised format such that it can be passed to these packages, and how to convert the outputs back into a VarNamedTuple format for analysis. This is useful if you need to carry out more advanced analysis with a Turing model with packages that Turing does not have wrappers for.

Introduction

Before we demonstrate any code, we will first explain conceptually what we need to actually accomplish this vectorisation. Given a VarNamedTuple of parameter values, we need two pieces of information to flatten it into a vector:

How to convert each individual value into a vector; and
How to concatenate those vectors together into a single vector.

We accomplish this by specifying a transform function for each parameter, as well as a range for each parameter. For example, conceptually, the VarNamedTuple above with x and y can be flattened with a representation that looks something like this.

(Please note that you don’t have to define any of this yourself! Turing has built-in functionality for this, which we’ll cover in a moment. This is just to illustrate the principle behind it.)

struct TransformPlusRange{F}
    transform::F
    range::UnitRange{Int}
end

tovec(f) = [f]
info = @vnt begin
    x := TransformPlusRange(tovec, 1:1)
    y := TransformPlusRange(tovec, 2:2)
end

VarNamedTuple
├─ x => TransformPlusRange{typeof(tovec)}(tovec, 1:1)
└─ y => TransformPlusRange{typeof(tovec)}(tovec, 2:2)

(For more information on the @vnt macro, please see the VarNamedTuple page.)

To flatten the parameters, we can then do the following:

# Begin by allocating a vector of the appropriate length.
params = Vector{Float64}(undef, 2)

# Iterate through the original `VarNamedTuple`. For each key:
for (vn, value) in pairs(vnt)
    vn_info = info[vn]
    # Use the transform to vectorise the value
    vectorised_value = vn_info.transform(value)
    # Store it in the appropriate range of the allocated vector
    params[vn_info.range] = vectorised_value
end

params

2-element Vector{Float64}:
 -0.12924004308483067
 -0.28406805321037926

`LogDensityFunction`

Turing abstracts all of this away for you by providing a struct, LogDensityFunction, which wraps a model plus all of the necessary information above for vectorisation.

More details

In this page we will only cover a fairly high-level overview of the interface that LogDensityFunction exposes. In particular, we assume that you simply want to construct one of them, pass it to some package, and retrieve the results back in a VarNamedTuple format.

There are many more details of working with LogDensityFunction that can be useful if you are, for example, developing your own inference algorithms to work with Turing. We refer the interested reader to the DynamicPPL documentation for more information on this.

To construct a LogDensityFunction, you can simply pass a model:

@model f() = x ~ Dirichlet(ones(3))
model = f()

ldf = LogDensityFunction(model);

With this single-argument constructor, the vectorised representation of x will simply be itself (since Dirichlet is a multivariate distribution). Now, x also has the property that its elements must be non-negative and sum to 1. We can see this in action if we draw a randomised set of vectorised parameters:

rand(ldf)

3-element Vector{Float64}:
 0.007617182052174021
 0.8241602928452552
 0.16822252510257082

By default, LogDensityFunction does not apply any extra transformations to the parameters, which is why the output above is still constrained to the simplex. However, such constraints can cause problems for some algorithms which expect to work in unconstrained space. To address this, you can instantiate a LogDensityFunction as follows:

using DynamicPPL

ldf_linked = LogDensityFunction(model, getlogjoint_internal, LinkAll());

Here:

LinkAll() specifies that we want all parameters in the model to be transformed to unconstrained vectors.
getlogjoint_internal says that the log-density we are interested in is the log-joint density, with any Jacobian adjustments for the transformations used above.

When we sample from this LogDensityFunction, we can see that the output is now unconstrained. In fact, the output is now a vector of length 2: since x must sum to 1, the third element of x is not needed since it can be recalculated from the first two elements.

rand(ldf_linked)

2-element Vector{Float64}:
 0.7990760628288759
 0.006571034878134632

The LogDensityProblems.jl interface

LogDensityFunction is not just a way to perform parameter value conversion: it also allows you to compute the log probability density associated with these parameters.

In particular, the LogDensityFunction struct is designed to obey the LogDensityProblems.jl interface. In particular, you can use the following functions:

using LogDensityProblems

LogDensityProblems.capabilities(ldf_linked)

LogDensityProblems.LogDensityOrder{0}()

N = LogDensityProblems.dimension(ldf_linked)

LogDensityProblems.logdensity(ldf_linked, randn(N))

-2.6922473193442373

Automatic differentiation

The LogDensityFunction that we created above does not yet know how to compute gradients of the parameters on its own. To enable integration with automatic differentiation (AD) packages, you can specify the adtype keyword argument:

ldf_ad = LogDensityFunction(model, getlogjoint_internal, LinkAll(); adtype=AutoForwardDiff());

Gradient preparation

When constructing a LogDensityFunction with an adtype, gradient preparation (using DifferentiationInterface.jl) will be performed automatically. This means that subsequent gradient calculations will be much faster, but also means that depending on the backend used, the construction of the LogDensityFunction may take a long time.

This allows you to compute not only logdensity, but also logdensity_and_gradient:

# We now have first-order AD capabilities
LogDensityProblems.capabilities(ldf_ad)

LogDensityProblems.LogDensityOrder{1}()

LogDensityProblems.logdensity_and_gradient(ldf_ad, randn(N))

(-3.5247734203400913, [-1.1371025655855775, -0.017927829054927014])

If you need more information on how to choose an AD backend, please see the automatic differentiation page for more information.

Changing the log-density function

Most of the time, getlogjoint_internal is the appropriate kind of log-density function to use: it represents the log-joint density in the transformed space (‘internal’ is a slightly poor name, but it has stuck), and is what inference algorithms tend to expect.

However, there are also other options: for example LogDensityFunction(model, getloglikelihood, ...) will give you a log-density function that only computes the likelihood of the data given the parameters.

Please consult the DynamicPPL documentation for more information on this.

Using custom transforms

When instantiated, LogDensityFunction will automatically create its own mapping of VarNames to ranges and transforms (much like in the example above). If you are happy with the default transforms, then you would never need to worry about this. However, advanced users may wish to specify their own transforms.

In order to do this, you have to use a different constructor for LogDensityFunction, namely

LogDensityFunction(model, logdensity_function, ranges_and_transforms, sample_vec; adtype)

Please see the DynamicPPL API documentation for more information on these arguments.

Conversions between `VarNamedTuple` and vectors

Finally, we will cover the functions that allow you to convert between VarNamedTuple and vector formats.

From `VarNamedTuple` to vector

We already saw that we can draw random samples from a LogDensityFunction in vector format:

rand(ldf_ad)

2-element Vector{Float64}:
 1.3044017508250105
 0.9971018207377219

By default, this draws from the prior of the model, using InitFromPrior (see Sampling options for more information on this). If you want vectorised samples for a specific set of parameters, you can provide an initialisation strategy that uses those parameters:

vnt = rand(model)

VarNamedTuple
└─ x => [0.6189349981221288, 0.11421624290509992, 0.2668487589727714]

rand(ldf_ad, InitFromParams(vnt))

2-element Vector{Float64}:
  1.178177467441182
 -0.8485885329127764

(Note that this does technically mean that the output is no longer truly ‘random’.)

From vector to `VarNamedTuple`

Now suppose that you have run an inference algorithm using a LogDensityFunction and it has given you back a set of ‘best’ parameters (as a vector, of course):

best_params = randn(LogDensityProblems.dimension(ldf_ad))

2-element Vector{Float64}:
 0.006305856336922667
 0.1732206851468863

To convert this back into a VarNamedTuple, you have to reevaluate the model. The easiest way to do so is to use DynamicPPL.ParamsWithStats:

pws = ParamsWithStats(best_params, ldf_ad)

ParamsWithStats
 ├─ params
 │  VarNamedTuple
 │  └─ x => [0.3347361043720203, 0.3613694938634376, 0.3038944017645422]
 └─ stats
    ├─ logprior = 0.6931471805599453
    ├─ loglikelihood = 0.0
    └─ logjoint = 0.6931471805599453

This also helpfully computes the log-probabilities associated with this set of parameters (if you do not need this, you can disable it with ParamsWithStats(...; include_log_probs=false)). You can then access pws.params to get the VarNamedTuple of parameters. For example, to retrieve the original value of x:

pws.params[@varname(x)]

3-element Vector{Float64}:
 0.3347361043720203
 0.3613694938634376
 0.3038944017645422

More fine-grained control

These wrapper functions are designed for maximum ease of use. However, as mentioned above, sometimes you may want more fine-grained control. To do so, you should use the DynamicPPL API and in particular the DynamicPPL.init!! function. This is covered in much more detail in the DynamicPPL documentation!

Introduction

LogDensityFunction

The LogDensityProblems.jl interface

Automatic differentiation

Changing the log-density function

Using custom transforms

Conversions between VarNamedTuple and vectors

From VarNamedTuple to vector

From vector to VarNamedTuple

More fine-grained control

`LogDensityFunction`

Conversions between `VarNamedTuple` and vectors

From `VarNamedTuple` to vector

From vector to `VarNamedTuple`