Implementing samplers

In this tutorial, we’ll go through step-by-step how to implement a “simple” sampler in AbstractMCMC.jl in such a way that it can be easily applied to Turing.jl models.

In particular, we’re going to implement a version of Metropolis-adjusted Langevin (MALA).

Note that we will implement this sampler in the AbstractMCMC.jl framework, completely “ignoring” Turing.jl until the very end of the tutorial, at which point we’ll use a single line of code to make the resulting sampler available to Turing.jl. This is to really drive home the point that one can implement samplers in a way that is accessible to all of Turing.jl’s users without having to use Turing.jl yourself.

Quick overview of MALA

We can view MALA as a single step of the leapfrog intergrator with resampling of momentum \(p\) at every step.1 To make that statement a bit more concrete, we first define the extended target \(\bar{\gamma}(x, p)\) as

\[\begin{equation*} \log \bar{\gamma}(x, p) \propto \log \gamma(x) + \log \gamma_{\mathcal{N}(0, M)}(p) \end{equation*}\]

where \(\gamma_{\mathcal{N}(0, M)}\) denotes the density for a zero-centered Gaussian with covariance matrix \(M\). We then consider targeting this joint distribution over both \(x\) and \(p\) as follows. First we define the map

\[\begin{equation*} \begin{split} L_{\epsilon}: \quad & \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}^d \times \mathbb{R}^d \\ & (x, p) \mapsto (\tilde{x}, \tilde{p}) := L_{\epsilon}(x, p) \end{split} \end{equation*}\]

as

\[\begin{equation*} \begin{split} p_{1 / 2} &:= p + \frac{\epsilon}{2} \nabla \log \gamma(x) \\ \tilde{x} &:= x + \epsilon M^{-1} p_{1 /2 } \\ p_1 &:= p_{1 / 2} + \frac{\epsilon}{2} \nabla \log \gamma(\tilde{x}) \\ \tilde{p} &:= - p_1 \end{split} \end{equation*}\]

This might be familiar for some readers as a single step of the Leapfrog integrator. We then define the MALA kernel as follows: given the current iterate \(x_i\), we sample the next iterate \(x_{i + 1}\) as

\[\begin{equation*} \begin{split} p &\sim \mathcal{N}(0, M) \\ (\tilde{x}, \tilde{p}) &:= L_{\epsilon}(x_i, p) \\ \alpha &:= \min \left\{ 1, \frac{\bar{\gamma}(\tilde{x}, \tilde{p})}{\bar{\gamma}(x_i, p)} \right\} \\ x_{i + 1} &:= \begin{cases} \tilde{x} \quad & \text{ with prob. } \alpha \\ x_i \quad & \text{ with prob. } 1 - \alpha \end{cases} \end{split} \end{equation*}\]

i.e. we accept the proposal \(\tilde{x}\) with probability \(\alpha\) and reject it, thus sticking with our current iterate, with probability \(1 - \alpha\).

What we need from a model: LogDensityProblems.jl

There are a few things we need from the “target” / “model” / density that we want to sample from:

  1. We need access to log-density evaluations \(\log \gamma(x)\) so we can compute the acceptance ratio involving \(\log \bar{\gamma}(x, p)\).
  2. We need access to log-density gradients \(\nabla \log \gamma(x)\) so we can compute the Leapfrog steps \(L_{\epsilon}(x, p)\).
  3. We also need access to the “size” of the model so we can determine the size of \(M\).

Luckily for us, there is a package called LogDensityProblems.jl which provides an interface for exactly this!

To demonstrate how one can implement the “LogDensityProblems.jl interface”2 we will use a simple Gaussian model as an example:

using LogDensityProblems: LogDensityProblems;

# Let's define some type that represents the model.
struct IsotropicNormalModel{M<:AbstractVector{<:Real}}
    "mean of the isotropic Gaussian"
    mean::M
end

# Specifies what input length the model expects.
LogDensityProblems.dimension(model::IsotropicNormalModel) = length(model.mean)
# Implementation of the log-density evaluation of the model.
function LogDensityProblems.logdensity(model::IsotropicNormalModel, x::AbstractVector{<:Real})
    return - sum(abs2, x .- model.mean) / 2
end

This gives us all of the properties we want for our MALA sampler with the exception of the computation of the gradient \(\nabla \log \gamma(x)\). There is the method LogDensityProblems.logdensity_and_gradient which should return a 2-tuple where the first entry is the evaluation of the logdensity \(\log \gamma(x)\) and the second entry is the gradient \(\nabla \log \gamma(x)\).

There are two ways to “implement” this method: 1) we implement it by hand, which is feasible in the case of our IsotropicNormalModel, or b) we defer the implementation of this to a automatic differentiation backend.

To implement it by hand we can simply do

# Tell LogDensityProblems.jl that first-order, i.e. gradient information, is available.
LogDensityProblems.capabilities(model::IsotropicNormalModel) = LogDensityProblems.LogDensityOrder{1}()

# Implement `logdensity_and_gradient`.
function LogDensityProblems.logdensity_and_gradient(model::IsotropicNormalModel, x)
    logγ_x = LogDensityProblems.logdensity(model, x)
    ∇logγ_x = -x .* (x - model.mean)
    return logγ_x, ∇logγ_x
end

Let’s just try it out:

# Instantiate the problem.
model = IsotropicNormalModel([-5., 0., 5.])
# Create some example input that we can test on.
x_example = randn(LogDensityProblems.dimension(model))
# Evaluate!
LogDensityProblems.logdensity(model, x_example)
-37.05590269609623

To defer it to an automatic differentiation backend, we can do

# Tell LogDensityProblems.jl we only have access to 0-th order information.
LogDensityProblems.capabilities(model::IsotropicNormalModel) = LogDensityProblems.LogDensityOrder{0}()

# Use `LogDensityProblemsAD`'s `ADgradient` in combination with some AD backend to implement `logdensity_and_gradient`.
using LogDensityProblemsAD, ADTypes, ForwardDiff
model_with_grad = ADgradient(AutoForwardDiff(), model)
LogDensityProblems.logdensity(model_with_grad, x_example)
-37.05590269609623

We’ll continue with the second approach in this tutorial since this is typically what one does in practice, because there are better hobbies to spend time on than deriving gradients by hand.

At this point, one might wonder how we’re going to tie this back to Turing.jl in the end. Effectively, when working with inference methods that only require log-density evaluations and / or higher-order information of the log-density, Turing.jl actually converts the user-provided Model into an object implementing the above methods for LogDensityProblems.jl. As a result, most samplers provided by Turing.jl are actually implemented to work with LogDensityProblems.jl, enabling their use both within Turing.jl and outside of Turing.jl! Morever, there exists similar conversions for Stan through BridgeStan and StanLogDensityProblems.jl, which means that a sampler supporting the LogDensityProblems.jl interface can easily be used on both Turing.jl and Stan models (in addition to user-provided models, as our IsotropicNormalModel above)!

Anyways, let’s move on to actually implementing the sampler.

Implementing MALA in AbstractMCMC.jl

Now that we’ve established that a model implementing the LogDensityProblems.jl interface provides us with all the information we need from \(\log \gamma(x)\), we can address the question: given an object that implements the LogDensityProblems.jl interface, how can we define a sampler for it?

We’re going to do this by making our sampler a sub-type of AbstractMCMC.AbstractSampler in addition to implementing a few methods from AbstractMCMC.jl. Why? Because it gets us a lot of functionality for free, as we will see later.

Moreover, AbstractMCMC.jl provides a very natural interface for MCMC algorithms.

First, we’ll define our MALA type

using AbstractMCMC

struct MALA{T,A} <: AbstractMCMC.AbstractSampler
    "stepsize used in the leapfrog step"
    ϵ_init::T
    "covariance matrix used for the momentum"
    M_init::A
end

Notice how we’ve added the suffix _init to both the stepsize and the covariance matrix. We’ve done this because a AbstractMCMC.AbstractSampler should be immutable. Of course there might be many scenarios where we want to allow something like the stepsize and / or the covariance matrix to vary between iterations, e.g. during the burn-in / adaptation phase of the sampling process we might want to adjust the parameters using statistics computed from these initial iterations. But information which can change between iterations should not go in the sampler itself! Instead, this information should go in the sampler state.

The sampler state should at the very least contain all the necessary information to perform the next MCMC iteration, but usually contains further information, e.g. quantities and statistics useful for evaluating whether the sampler has converged.

We will use the following sampler state for our MALA sampler:

struct MALAState{A<:AbstractVector{<:Real}}
    "current position"
    x::A
end

This might seem overly redundant: we’re defining a type MALAState and it only contains a simple vector of reals. In this particular case we indeed could have dropped this and simply used a AbstractVector{<:Real} as our sampler state, but typically, as we will see later, one wants to include other quantities in the sampler state. For example, if we also wanted to adapt the parameters of our MALA, e.g. alter the stepsize depending on acceptance rates, in which case we should also put ϵ in the state, but for now we’ll keep things simple.

Moreover, we also want a sample type, which is a type meant for “public consumption”, i.e. the end-user. This is generally going to contain a subset of the information present in the state. But in such a simple scenario as this, we similarly only have a AbstractVector{<:Real}:

struct MALASample{A<:AbstractVector{<:Real}}
    "current position"
    x::A
end

We currently have three things:

  1. A AbstractMCMC.AbstractSampler implementation called MALA.
  2. A state MALAState for our sampler MALA.
  3. A sample MALASample for our sampler MALA.

That means that we’re ready to implement the only thing that really matters: AbstractMCMC.step.

AbstractMCMC.step defines the MCMC iteration of our MALA given the current MALAState. Specifically, the signature of the function is as follows:

function AbstractMCMC.step(
    # The RNG to ensure reproducibility.
    rng::Random.AbstractRNG,
    # The model that defines our target.
    model::AbstractMCMC.AbstractModel,
    # The sampler for which we're taking a `step`.
    sampler::AbstractMCMC.AbstractSampler,
    # The current sampler `state`.
    state;
    # Additional keyword arguments that we may or may not need.
    kwargs...
)

Moreover, there is a specific AbstractMCMC.AbstractModel which is used to indicate that the model that is provided implements the LogDensityProblems.jl interface: AbstractMCMC.LogDensityModel.

Since, as we discussed earlier, in our case we’re indeed going to work with types that support the LogDensityProblems.jl interface, we’ll define AbstractMCMC.step for such a AbstractMCMC.LogDensityModel.

Note that AbstractMCMC.LogDensityModel has no other purpose; it has a single field called logdensity, and it does nothing else. But by wrapping the model in AbstractMCMC.LogDensityModel, it allows samplers that want to work with LogDensityProblems.jl to define their AbstractMCMC.step on this type without running into method ambiguities.

All in all, that means that the signature for our AbstractMCMC.step is going to be the following:

function AbstractMCMC.step(
    rng::Random.AbstractRNG,
    # `LogDensityModel` so we know we're working with LogDensityProblems.jl model.
    model::AbstractMCMC.LogDensityModel,
    # Our sampler.
    sampler::MALA,
    # Our sampler state.
    state::MALAState;
    kwargs...
)

Great! Now let’s actually implement the full AbstractMCMC.step for our MALA.

Let’s remind ourselves what we’re going to do:

  1. Sample a new momentum \(p\).
  2. Compute the log-density of the extended target \(\log \bar{\gamma}(x, p)\).
  3. Take a single leapfrog step \((\tilde{x}, \tilde{p}) = L_{\epsilon}(x, p)\).
  4. Accept or reject the proposed \((\tilde{x}, \tilde{p})\).

All in all, this results in the following:

using Random: Random
using Distributions  # so we get the `MvNormal`

function AbstractMCMC.step(
    rng::Random.AbstractRNG,
    model_wrapper::AbstractMCMC.LogDensityModel,
    sampler::MALA,
    state::MALAState;
    kwargs...
)
    # Extract the wrapped model which implements LogDensityProblems.jl.
    model = model_wrapper.logdensity
    # Let's just extract the sampler parameters to make our lives easier.
    ϵ = sampler.ϵ_init
    M = sampler.M_init
    # Extract the current parameters.
    x = state.x
    # Sample the momentum.
    p_dist = MvNormal(zeros(LogDensityProblems.dimension(model)), M)
    p = rand(rng, p_dist)
    # Propose using a single leapfrog step.
    x̃, p̃ = leapfrog_step(model, x, p, ϵ, M)
    # Accept or reject proposal.
    logp = LogDensityProblems.logdensity(model, x) + logpdf(p_dist, p)
    logp̃ = LogDensityProblems.logdensity(model, x̃) + logpdf(p_dist, p̃)
    logα = logp̃ - logp
    state_new = if log(rand(rng)) < logα
        # Accept.
        MALAState(x̃)
    else
        # Reject.
        MALAState(x)
    end
    # Return the "sample" and the sampler state.
    return MALASample(state_new.x), state_new
end

Fairly straight-forward.

Of course, we haven’t defined the leapfrog_step method yet, so let’s do that:

function leapfrog_step(model, x, p, ϵ, M)
    # Update momentum `p` using "position" `x`.
    ∇logγ_x = last(LogDensityProblems.logdensity_and_gradient(model, x))
    p1 = p +/ 2) .* ∇logγ_x
    # Update the "position" `x` using momentum `p1`.
= x + ϵ .* (M \ p1)
    # Update momentum `p1` using position `x̃`
    ∇logγ_x̃ = last(LogDensityProblems.logdensity_and_gradient(model, x̃))
    p2 = p1 +/ 2) .* ∇logγ_x̃
    # Flip momentum `p2`.
= -p2
    return x̃, p̃
end
leapfrog_step (generic function with 1 method)

With all of this, we’re technically ready to sample!

using Random, LinearAlgebra

rng = Random.default_rng()
sampler = MALA(1, I)
state = MALAState(zeros(LogDensityProblems.dimension(model)))

x_next, state_next = AbstractMCMC.step(
    rng,
    AbstractMCMC.LogDensityModel(model),
    sampler,
    state
)
(MALASample{Vector{Float64}}([0.0, 0.0, 0.0]), MALAState{Vector{Float64}}([0.0, 0.0, 0.0]))

Great, it works!

And I promised we would get quite some functionality for free if we implemented AbstractMCMC.step, and so we can now simply call sample to perform standard MCMC sampling:

# Perform 1000 iterations with our `MALA` sampler.
samples = sample(model_with_grad, sampler, 10_000; initial_state=state, progress=false)
# Concatenate into a matrix.
samples_matrix = stack(sample -> sample.x, samples)
3×10000 Matrix{Float64}:
 -2.78278  -4.13367  -5.68879   …  -3.47171   -5.16638   -5.12906
 -1.10107  -1.84306   0.188861      0.638975   0.450465  -0.0722107
  1.26035   2.91182   3.92924       6.78755    4.97132    2.82473
# Compute the marginal means and standard deviations.
hcat(mean(samples_matrix; dims=2), std(samples_matrix; dims=2))
3×2 Matrix{Float64}:
 -5.0188     0.971507
 -0.0161385  0.991921
  4.98162    1.00555

Let’s visualize the samples

using StatsPlots
plot(transpose(samples_matrix[:, 1:10:end]), alpha=0.5, legend=false)

Look at that! Things are working; amazin’.

We can also exploit AbstractMCMC.jl’s parallel sampling capabilities:

# Run separate 4 chains for 10 000 iterations using threads to parallelize.
num_chains = 4
samples = sample(
    model_with_grad,
    sampler,
    MCMCThreads(),
    10_000,
    num_chains;
    # Note we need to provide an initial state for every chain.
    initial_state=fill(state, num_chains),
    progress=false
)
samples_array = stack(map(Base.Fix1(stack, sample -> sample.x), samples))
3×10000×4 Array{Float64, 3}:
[:, :, 1] =
 -0.66818   -3.32735   -6.00457  -4.66665   …  -5.0119   -6.1992   -6.21395
 -0.241267  -0.503414  -0.9682   -0.100187      1.31836   1.36903  -0.313069
  1.60503    2.28402    4.3039    6.80373       4.17528   4.57181   5.16708

[:, :, 2] =
 -3.17241   -4.82932   -5.75602    …  -5.72496   -5.72496   -6.37789
  0.103616  -0.213427   0.0309451     -0.692598  -0.692598  -0.470064
  1.95452    3.62377    3.16469        3.88005    3.88005    3.26456

[:, :, 3] =
 -3.27622    -6.11942  -6.11942  -6.11942  …  -4.90242  -2.80257  -5.92178
  0.0407916  -1.10107  -1.10107  -1.10107      1.35053   4.11327   0.959433
  2.9382      4.64376   4.64376   4.64376      4.87952   5.23964   3.97176

[:, :, 4] =
 -1.93132  -2.48145  -3.15684   -6.34088  …  -5.86344  -5.86344  -5.07002
  1.95368   1.08849  -0.515951  -1.78653     -1.21146  -1.21146  -2.09424
  3.94861   3.66815   3.39297    3.93798      5.52444   5.52444   5.39758

But the fact that we have to provide the AbstractMCMC.sample call, etc. with an initial_state to get started is a bit annoying. We can avoid this by also defining a AbstractMCMC.step without the state argument:

function AbstractMCMC.step(
    rng::Random.AbstractRNG,
    model_wrapper::AbstractMCMC.LogDensityModel,
    ::MALA;
    # NOTE: No state provided!
    kwargs...
)
    model = model_wrapper.logdensity
    # Let's just create the initial state by sampling using  a Gaussian.
    x = randn(rng, LogDensityProblems.dimension(model))

    return MALASample(x), MALAState(x)
end

Equipped with this, we no longer need to provide the initial_state everywhere:

samples = sample(model_with_grad, sampler, 10_000; progress=false)
samples_matrix = stack(sample -> sample.x, samples)
hcat(mean(samples_matrix; dims=2), std(samples_matrix; dims=2))
3×2 Matrix{Float64}:
 -5.00611   1.01307
 -0.031821  1.01648
  4.99231   0.994078

Using our sampler with Turing.jl

As we promised, all of this hassle of implementing our MALA sampler in a way that uses LogDensityProblems.jl and AbstractMCMC.jl gets us something more than just an “automatic” implementation of AbstractMCMC.sample.

It also enables use with Turing.jl through the externalsampler, but we need to do one final thing first: we need to tell Turing.jl how to extract a vector of parameters from the “sample” returned in our implementation of AbstractMCMC.step. In our case, the “sample” is a MALASample, so we just need the following line:

# Load Turing.jl.
using Turing

# Overload the `getparams` method for our "sample" type, which is just a vector.
Turing.Inference.getparams(::Turing.Model, sample::MALASample) = sample.x

And with that, we’re good to go!

# Our previous model defined as a Turing.jl model.
@model mvnormal_model() = x ~ MvNormal([-5., 0., 5.], I)
# Instantiate our model.
turing_model = mvnormal_model()
# Call `sample` but now we're passing in a Turing.jl `model` and wrapping
# our `MALA` sampler in the `externalsampler` to tell Turing.jl that the sampler
# expects something that implements LogDensityProblems.jl.
chain = sample(turing_model, externalsampler(sampler), 10_000; progress=false)
Chains MCMC chain (10000×4×1 Array{Float64, 3}):

Iterations        = 1:1:10000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 2.55 seconds
Compute duration  = 2.55 seconds
parameters        = x[1], x[2], x[3]
internals         = lp

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

        x[1]   -5.0197    0.9924    0.0167   3527.3273   5615.0537    1.0001   ⋯
        x[2]   -0.0096    1.0072    0.0176   3269.9207   5023.6311    1.0003   ⋯
        x[3]    5.0025    1.0030    0.0176   3267.1832   5872.3409    1.0004   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

        x[1]   -6.9861   -5.6827   -5.0031   -4.3370   -3.1142
        x[2]   -1.9885   -0.6730   -0.0034    0.6726    1.9583
        x[3]    3.0575    4.3226    4.9918    5.6778    6.9625

Pretty neat, eh?

Models with constrained parameters

One thing we’ve sort of glossed over in all of the above is that MALA, at least how we’ve implemented it, requires \(x\) to live in \(\mathbb{R}^d\) for some \(d > 0\). If some of the parameters were in fact constrained, e.g. we were working with a Beta distribution which has support on the interval \((0, 1)\), not on \(\mathbb{R}^d\), we could easily end up outside of the valid range \((0, 1)\).

@model beta_model() = x ~ Beta(3, 3)
turing_model = beta_model()
chain = sample(turing_model, externalsampler(sampler), 10_000; progress=false)
Chains MCMC chain (10000×2×1 Array{Float64, 3}):

Iterations        = 1:1:10000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 1.67 seconds
Compute duration  = 1.67 seconds
parameters        = x
internals         = lp

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

           x    0.4961    0.1873    0.0027   4673.6242   6036.3048    1.0000   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           x    0.1466    0.3557    0.4946    0.6379    0.8481

Yep, that still works, but only because Turing.jl actually transforms the turing_model from constrained to unconstrained, so that the sampler provided to externalsampler is actually always working in unconstrained space! This is not always desirable, so we can turn this off:

chain = sample(turing_model, externalsampler(sampler; unconstrained=false), 10_000; progress=false)
Chains MCMC chain (10000×2×1 Array{Float64, 3}):

Iterations        = 1:1:10000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 0.25 seconds
Compute duration  = 0.25 seconds
parameters        = x
internals         = lp

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e     Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

           x    0.5068    0.1374    0.0114   122.8752    68.7605    1.0175     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           x    0.2429    0.4061    0.5085    0.6198    0.7424

The fun thing is that this still sort of works because

logpdf(Beta(3, 3), 10.0)
-Inf

and so the samples that fall outside of the range are always rejected. But do notice how much worse all the diagnostics are, e.g. ess_tail is very poor compared to when we use unconstrained=true. Moreover, in more complex cases this won’t just result in a “nice” -Inf log-density value, but instead will error:

@model function demo()
    σ² ~ truncated(Normal(), lower=0)
    # If we end up with negative values for `σ²`, the `Normal` will error.
    x ~ Normal(0, σ²)
end
sample(demo(), externalsampler(sampler; unconstrained=false), 10_000; progress=false)
DomainError with Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}}(-1.1918306148166926,1.0,0.0):
Normal: the condition σ >= zero(σ) is not satisfied.
Stacktrace:
  [1] #371
    @ ~/.julia/packages/Distributions/ji8PW/src/univariate/continuous/normal.jl:37 [inlined]
  [2] check_args
    @ ~/.julia/packages/Distributions/ji8PW/src/utils.jl:89 [inlined]
  [3] #Normal#370
    @ ~/.julia/packages/Distributions/ji8PW/src/univariate/continuous/normal.jl:37 [inlined]
  [4] Normal
    @ ~/.julia/packages/Distributions/ji8PW/src/univariate/continuous/normal.jl:36 [inlined]
  [5] Normal
    @ ~/.julia/packages/Distributions/ji8PW/src/univariate/continuous/normal.jl:42 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/DynamicPPL/E4kDs/src/compiler.jl:579 [inlined]
  [7] demo(__model__::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, __varinfo__::DynamicPPL.ThreadSafeVarInfo{DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}, Vector{Set{DynamicPPL.Selector}}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}, Vector{Set{DynamicPPL.Selector}}}}, ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}, Vector{Base.RefValue{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}}}, __context__::DynamicPPL.DefaultContext)
    @ Main.Notebook ~/work/docs/docs/tutorials/docs-17-implementing-samplers/index.qmd:454
  [8] _evaluate!!
    @ ~/.julia/packages/DynamicPPL/E4kDs/src/model.jl:963 [inlined]
  [9] evaluate_threadsafe!!(model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, varinfo::DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}, Vector{Set{DynamicPPL.Selector}}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}, Vector{Set{DynamicPPL.Selector}}}}, ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}, context::DynamicPPL.DefaultContext)
    @ DynamicPPL ~/.julia/packages/DynamicPPL/E4kDs/src/model.jl:952
 [10] evaluate!!
    @ ~/.julia/packages/DynamicPPL/E4kDs/src/model.jl:887 [inlined]
 [11] logdensity
    @ ~/.julia/packages/DynamicPPL/E4kDs/src/logdensityfunction.jl:94 [inlined]
 [12] Fix1
    @ ./operators.jl:1118 [inlined]
 [13] vector_mode_dual_eval!(f::Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityFunction{DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}, Float64}, DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.DefaultContext}}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}}, x::Vector{Float64})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/apiutils.jl:24
 [14] vector_mode_gradient!(result::DiffResults.MutableDiffResult{1, Float64, Tuple{Vector{Float64}}}, f::Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityFunction{DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}, Float64}, DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.DefaultContext}}, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:96
 [15] gradient!
    @ ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:37 [inlined]
 [16] gradient!
    @ ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:35 [inlined]
 [17] logdensity_and_gradient
    @ ~/.julia/packages/LogDensityProblemsAD/rBlLq/ext/LogDensityProblemsADForwardDiffExt.jl:118 [inlined]
 [18] leapfrog_step(model::LogDensityProblemsADForwardDiffExt.ForwardDiffLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}, Float64}, DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.DefaultContext}, ForwardDiff.Chunk{2}, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}}}, x::Vector{Float64}, p::Vector{Float64}, ϵ::Int64, M::UniformScaling{Bool})
    @ Main.Notebook ~/work/docs/docs/tutorials/docs-17-implementing-samplers/index.qmd:303
 [19] step(rng::TaskLocalRNG, model_wrapper::AbstractMCMC.LogDensityModel{LogDensityProblemsADForwardDiffExt.ForwardDiffLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}, Float64}, DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.DefaultContext}, ForwardDiff.Chunk{2}, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}}}}, sampler::MALA{Int64, UniformScaling{Bool}}, state::MALAState{Vector{Float64}}; kwargs::@Kwargs{})
    @ Main.Notebook ~/work/docs/docs/tutorials/docs-17-implementing-samplers/index.qmd:274
 [20] step
    @ ~/work/docs/docs/tutorials/docs-17-implementing-samplers/index.qmd:256 [inlined]
 [21] #step#108
    @ ~/.julia/packages/Turing/aAIq7/src/mcmc/abstractmcmc.jl:90 [inlined]
 [22] step(rng::TaskLocalRNG, model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, sampler_wrapper::DynamicPPL.Sampler{Turing.Inference.ExternalSampler{MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}, false}}, state::Turing.Inference.TuringState{MALAState{Vector{Float64}}, LogDensityProblemsADForwardDiffExt.ForwardDiffLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{@NamedTuple{σ²::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:σ², typeof(identity)}, Int64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Nothing}}, Vector{AbstractPPL.VarName{:σ², typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, x::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}, Float64}, DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.DefaultContext}, ForwardDiff.Chunk{2}, ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, ForwardDiff.GradientConfig{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 2}}}}})
    @ Turing.Inference ~/.julia/packages/Turing/aAIq7/src/mcmc/abstractmcmc.jl:79
 [23] macro expansion
    @ ~/.julia/packages/AbstractMCMC/YrmkI/src/sample.jl:176 [inlined]
 [24] macro expansion
    @ ~/.julia/packages/AbstractMCMC/YrmkI/src/logging.jl:16 [inlined]
 [25] mcmcsample(rng::TaskLocalRNG, model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, sampler::DynamicPPL.Sampler{Turing.Inference.ExternalSampler{MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}, false}}, N::Int64; progress::Bool, progressname::String, callback::Nothing, discard_initial::Int64, thinning::Int64, chain_type::Type, initial_state::Nothing, kwargs::@Kwargs{})
    @ AbstractMCMC ~/.julia/packages/AbstractMCMC/YrmkI/src/sample.jl:120
 [26] sample(rng::TaskLocalRNG, model::DynamicPPL.Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, sampler::DynamicPPL.Sampler{Turing.Inference.ExternalSampler{MALA{Int64, UniformScaling{Bool}}, AutoForwardDiff{nothing, Nothing}, false}}, N::Int64; chain_type::Type, resume_from::Nothing, initial_state::Nothing, kwargs::@Kwargs{progress::Bool})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/E4kDs/src/sampler.jl:93
 [27] sample
    @ ~/.julia/packages/DynamicPPL/E4kDs/src/sampler.jl:83 [inlined]
 [28] #sample#4
    @ ~/.julia/packages/Turing/aAIq7/src/mcmc/Inference.jl:263 [inlined]
 [29] sample
    @ ~/.julia/packages/Turing/aAIq7/src/mcmc/Inference.jl:256 [inlined]
 [30] #sample#3
    @ ~/.julia/packages/Turing/aAIq7/src/mcmc/Inference.jl:253 [inlined]
 [31] top-level scope
    @ ~/work/docs/docs/tutorials/docs-17-implementing-samplers/index.qmd:456

As expected, we run into a DomainError at some point, while if we set unconstrained=true, letting Turing.jl transform the model to a unconstrained form behind the scenes, everything works as expected:

sample(demo(), externalsampler(sampler; unconstrained=true), 10_000; progress=false)
Chains MCMC chain (10000×3×1 Array{Float64, 3}):

Iterations        = 1:1:10000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 0.55 seconds
Compute duration  = 0.55 seconds
parameters        = σ², x
internals         = lp

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e     Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          σ²    0.7810    0.5625    0.0468    98.2899   106.2617    1.0232     ⋯
           x   -0.1341    1.0288    0.1068   135.5003    35.7944    1.0240     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

          σ²    0.0658    0.2780    0.7228    1.1230    2.1449
           x   -2.2802   -0.4395   -0.0294    0.3548    1.9635

Neat!

Similarly, which automatic differentiation backend one should use can be specified through the adtype keyword argument too. For example, if we want to use ReverseDiff.jl instead of the default ForwardDiff.jl:

using ReverseDiff: ReverseDiff
# Specify that we want to use `AutoReverseDiff`.
sample(
    demo(),
    externalsampler(sampler; unconstrained=true, adtype=AutoReverseDiff()),
    10_000;
    progress=false
)
Chains MCMC chain (10000×3×1 Array{Float64, 3}):

Iterations        = 1:1:10000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 2.92 seconds
Compute duration  = 2.92 seconds
parameters        = σ², x
internals         = lp

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e     Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          σ²    0.3613    0.0000    0.0000        NaN        NaN       NaN     ⋯
           x    1.8822    0.0000    0.0000        NaN        NaN       NaN     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

          σ²    0.3613    0.3613    0.3613    0.3613    0.3613
           x    1.8822    1.8822    1.8822    1.8822    1.8822

Double-neat.

Summary

At this point it’s worth maybe reminding ourselves what we did and also why we did it:

  1. We define our models in the LogDensityProblems.jl interface because it makes the sampler agnostic to how the underlying model is implemented.
  2. We implement our sampler in the AbstractMCMC.jl interface, which just means that our sampler is a subtype of AbstractMCMC.AbstractSampler and we implement the MCMC transition in AbstractMCMC.step.
  3. Points 1 and 2 makes it so our sampler can be used with a wide range of model implementations, amongst them being models implemented in both Turing.jl and Stan. This gives you, the inference implementer, a large collection of models to test your inference method on, in addition to allowing users of Turing.jl and Stan to try out your inference method with minimal effort.
Back to top

Footnotes

  1. We’re going with the leapfrog formulation because in a future version of this tutorial we’ll add a section extending this simple “baseline” MALA sampler to more complex versions. See issue #479 for progress on this.↩︎

  2. There is no such thing as a proper interface in Julia (at least not officially), and so we use the word “interface” here to mean a few minimal methods that needs to be implemented by any type that we treat as a target model.↩︎