Using External Samplers

Using External Samplers on Turing Models

Turing provides several wrapped samplers from external sampling libraries, e.g., HMC samplers from AdvancedHMC. These wrappers allow new users to seamlessly sample statistical models without leaving Turing However, these wrappers might only sometimes be complete, missing some functionality from the wrapped sampling library. Moreover, users might want to use samplers currently not wrapped within Turing.

For these reasons, Turing also makes running external samplers on Turing models easy without any necessary modifications or wrapping! Throughout, we will use a 10-dimensional Neal’s funnel as a running example::

# Import libraries.
using Turing, Random, LinearAlgebra

d = 10
@model function funnel()
    θ ~ Truncated(Normal(0, 3), -3, 3)
    z ~ MvNormal(zeros(d - 1), exp(θ) * I)
    return x ~ MvNormal(z, I)
end
funnel (generic function with 2 methods)

Now we sample the model to generate some observations, which we can then condition on.

(; x) = rand(funnel() |=0,))
model = funnel() | (; x);

Users can use any sampler algorithm to sample this model if it follows the AbstractMCMC API. Before discussing how this is done in practice, giving a high-level description of the process is interesting. Imagine that we created an instance of an external sampler that we will call spl such that typeof(spl)<:AbstractMCMC.AbstractSampler. In order to avoid type ambiguity within Turing, at the moment it is necessary to declare spl as an external sampler to Turing espl = externalsampler(spl), where externalsampler(s::AbstractMCMC.AbstractSampler) is a Turing function that types our external sampler adequately.

An excellent point to start to show how this is done in practice is by looking at the sampling library AdvancedMH (AdvancedMH’s GitHub) for Metropolis-Hastings (MH) methods. Let’s say we want to use a random walk Metropolis-Hastings sampler without specifying the proposal distributions. The code below constructs an MH sampler using a multivariate Gaussian distribution with zero mean and unit variance in d dimensions as a random walk proposal.

# Importing the sampling library
using AdvancedMH
rwmh = AdvancedMH.RWMH(d)
MetropolisHastings{RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}}(RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}(ZeroMeanIsoNormal(
dim: 10
μ: Zeros(10)
Σ: [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]
)
))
setprogress!(false)

Sampling is then as easy as:

chain = sample(model, externalsampler(rwmh), 10_000)
Chains MCMC chain (10000×11×1 Array{Float64, 3}):

Iterations        = 1:1:10000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 3.88 seconds
Compute duration  = 3.88 seconds
parameters        = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9]
internals         = lp

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

           θ   -0.1832    0.7262    0.1013    50.4107    82.6611    1.0015     ⋯
        z[1]    0.4362    0.6635    0.0819    62.9988   110.2627    1.0524     ⋯
        z[2]    0.1563    0.6261    0.0691    83.3282   144.1523    1.0080     ⋯
        z[3]    1.4439    0.8958    0.1309    47.8547    42.0160    1.0073     ⋯
        z[4]    0.0244    0.7051    0.0768    80.4296   128.9979    1.0048     ⋯
        z[5]   -0.4065    0.7177    0.0898    65.1409   170.8391    1.0123     ⋯
        z[6]   -0.3280    0.6908    0.0827    65.0407   125.0655    1.0540     ⋯
        z[7]   -0.1040    0.7313    0.0884    69.7145    84.9138    1.0062     ⋯
        z[8]    0.7738    0.5638    0.0662    68.9657   126.8716    1.0087     ⋯
        z[9]   -0.2450    0.6389    0.0740    72.6826   188.9812    1.0169     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           θ   -1.6953   -0.6483   -0.2681    0.3156    1.2798
        z[1]   -0.8733   -0.0164    0.4011    0.9036    1.6399
        z[2]   -0.9529   -0.2889    0.0588    0.4976    1.4132
        z[3]    0.1062    0.8056    1.2617    2.1108    3.6173
        z[4]   -1.1098   -0.4276   -0.0185    0.5125    1.5865
        z[5]   -1.6493   -0.9210   -0.4721    0.1161    0.8521
        z[6]   -1.8516   -0.7287   -0.2885    0.0660    1.1104
        z[7]   -1.6838   -0.6206   -0.0394    0.3321    1.2631
        z[8]   -0.2681    0.4358    0.8167    1.1285    1.9536
        z[9]   -1.5707   -0.6545   -0.2368    0.2486    0.8987

Going beyond the Turing API

As previously mentioned, the Turing wrappers can often limit the capabilities of the sampling libraries they wrap. AdvancedHMC1 (AdvancedHMC’s GitHub) is a clear example of this. A common practice when performing HMC is to provide an initial guess for the mass matrix. However, the native HMC sampler within Turing only allows the user to specify the type of the mass matrix despite the two options being possible within AdvancedHMC. Thankfully, we can use Turing’s support for external samplers to define an HMC sampler with a custom mass matrix in AdvancedHMC and then use it to sample our Turing model.

We will use the library Pathfinder2 ((Pathfinder’s GitHub)[https://github.com/mlcolab/Pathfinder.jl]) to construct our estimate of mass matrix. Pathfinder is a variational inference algorithm that first finds the maximum a posteriori (MAP) estimate of a target posterior distribution and then uses the trace of the optimization to construct a sequence of multivariate normal approximations to the target distribution. In this process, Pathfinder computes an estimate of the mass matrix the user can access.

The code below shows this can be done in practice.

using AdvancedHMC, Pathfinder
# Running pathfinder
draws = 1_000
result_multi = multipathfinder(model, draws; nruns=8)

# Estimating the metric
inv_metric = result_multi.pathfinder_results[1].fit_distribution.Σ
metric = DenseEuclideanMetric(Matrix(inv_metric))

# Creating an AdvancedHMC NUTS sampler with the custom metric.
n_adapts = 1000 # Number of adaptation steps
tap = 0.9 # Large target acceptance probability to deal with the funnel structure of the posterior
nuts = AdvancedHMC.NUTS(tap; metric=metric)

# Sample
chain = sample(model, externalsampler(nuts), 10_000; n_adapts=1_000)
┌ Warning: Pareto shape k = 1.2 > 1. Corresponding importance sampling estimates are likely to be unstable and are unlikely to converge with additional samples.
└ @ PSIS ~/.julia/packages/PSIS/fU76x/src/core.jl:362
[ Info: Found initial step size 3.2
Chains MCMC chain (10000×23×1 Array{Float64, 3}):

Iterations        = 1:1:10000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 6.08 seconds
Compute duration  = 6.08 seconds
parameters        = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, is_adapt

Summary Statistics
  parameters      mean       std      mcse     ess_bulk    ess_tail      rhat  ⋯
      Symbol   Float64   Float64   Float64      Float64     Float64   Float64  ⋯

           θ   -0.8773    1.0734    0.0266    1540.4691   1273.1077    1.0006  ⋯
        z[1]    0.2944    0.6102    0.0076    6944.2030   5327.6766    1.0006  ⋯
        z[2]    0.1690    0.5818    0.0057   10907.4261   6650.7295    1.0003  ⋯
        z[3]    0.9490    0.8135    0.0143    3322.8517   6323.4639    1.0003  ⋯
        z[4]    0.0623    0.5701    0.0054   11384.0730   6129.6453    0.9999  ⋯
        z[5]   -0.1475    0.5892    0.0060   10457.6815   5206.1937    1.0009  ⋯
        z[6]   -0.2819    0.5953    0.0064    9495.3772   5887.4577    1.0000  ⋯
        z[7]   -0.1698    0.5851    0.0059   10187.9188   6336.8450    1.0000  ⋯
        z[8]    0.5510    0.6755    0.0096    5531.6237   6360.2701    0.9999  ⋯
        z[9]   -0.2708    0.5943    0.0063    9478.8867   5920.2450    0.9999  ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           θ   -2.8169   -1.7036   -0.8341   -0.0707    1.0832
        z[1]   -0.8131   -0.0948    0.2305    0.6284    1.6728
        z[2]   -0.9515   -0.1851    0.1354    0.4999    1.4452
        z[3]   -0.2384    0.3349    0.7953    1.4443    2.8340
        z[4]   -1.1046   -0.2706    0.0533    0.3804    1.2879
        z[5]   -1.4410   -0.4825   -0.1100    0.2087    0.9868
        z[6]   -1.6157   -0.6194   -0.2290    0.1022    0.7842
        z[7]   -1.4700   -0.4946   -0.1272    0.1909    0.9335
        z[8]   -0.5369    0.0843    0.4392    0.9235    2.1173
        z[9]   -1.6063   -0.5973   -0.2232    0.1162    0.7999

Using new inference methods

So far we have used Turing’s support for external samplers to go beyond the capabilities of the wrappers. We want to use this support to employ a sampler not supported within Turing’s ecosystem yet. We will use the recently developed Micro-Cannoncial Hamiltonian Monte Carlo (MCHMC) sampler to showcase this. MCHMC[3,4] ((MCHMC’s GitHub)[https://github.com/JaimeRZP/MicroCanonicalHMC.jl]) is HMC sampler that uses one single Hamiltonian energy level to explore the whole parameter space. This is achieved by simulating the dynamics of a microcanonical Hamiltonian with an additional noise term to ensure ergodicity.

Using this as well as other inference methods outside the Turing ecosystem is as simple as executing the code shown below:

using MicroCanonicalHMC
# Create MCHMC sampler
n_adapts = 1_000 # adaptation steps
tev = 0.01 # target energy variance
mchmc = MCHMC(n_adapts, tev; adaptive=true)

# Sample
chain = sample(model, externalsampler(mchmc), 10_000)
[ Info: Tuning eps ⏳
[ Info: Tuning L ⏳
[ Info: Tuning sigma ⏳
Tuning:   0%|▏                                          |  ETA: 0:05:56
  ϵ:     0.5031087911132046
  L:     3.1622776601683795
  dE/d:  -0.001473597870779031


Tuning:   1%|▍                                          |  ETA: 0:04:01
  ϵ:     0.6091743899079449
  L:     2.885533838713511
  dE/d:  -0.018721259480051343


Tuning: 100%|███████████████████████████████████████████| Time: 0:00:02
  ϵ:     0.8596767093247397
  L:     379.13607708283774
  dE/d:  0.006815566060573985
Chains MCMC chain (10000×11×1 Array{Float64, 3}):

Iterations        = 1:1:10000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 4.94 seconds
Compute duration  = 4.94 seconds
parameters        = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9]
internals         = lp

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ⋯
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

           θ   -0.9143    1.1474    0.0374    894.4458    958.3049    1.0003   ⋯
        z[1]    0.2846    0.5495    0.0145   1540.0582   1482.6766    1.0048   ⋯
        z[2]    0.1777    0.5559    0.0153   1384.2012   1557.9999    1.0063   ⋯
        z[3]    0.9137    0.8241    0.0232   1421.5047   1677.5444    1.0006   ⋯
        z[4]    0.0415    0.5841    0.0169   1282.3919   1343.2095    1.0091   ⋯
        z[5]   -0.1159    0.6134    0.0185   1138.9152   1300.4095    0.9999   ⋯
        z[6]   -0.2811    0.5699    0.0154   1452.3150   1432.9169    1.0001   ⋯
        z[7]   -0.1883    0.5374    0.0148   1382.2526   1410.3457    1.0065   ⋯
        z[8]    0.5294    0.6984    0.0185   1599.5889   1628.8640    1.0022   ⋯
        z[9]   -0.2628    0.5508    0.0144   1574.5016   1491.6321    1.0001   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           θ   -2.8917   -1.8148   -0.9223   -0.0600    1.2682
        z[1]   -0.6482   -0.0744    0.2290    0.6006    1.4872
        z[2]   -0.8714   -0.1610    0.1297    0.4859    1.4089
        z[3]   -0.2823    0.2862    0.7596    1.4143    2.8570
        z[4]   -1.2064   -0.2790    0.0364    0.3659    1.2684
        z[5]   -1.3844   -0.4772   -0.1010    0.2706    1.0652
        z[6]   -1.5907   -0.6171   -0.2099    0.1006    0.7185
        z[7]   -1.4167   -0.4824   -0.1430    0.1312    0.8210
        z[8]   -0.5894    0.0530    0.4123    0.9061    2.1824
        z[9]   -1.5338   -0.5634   -0.2080    0.0917    0.6893

The only requirement to work with externalsampler is that the provided sampler must implement the AbstractMCMC.jl-interface [INSERT LINK] for a model of type AbstractMCMC.LogDensityModel [INSERT LINK].

As previously stated, in order to use external sampling libraries within Turing they must follow the AbstractMCMC API. In this section, we will briefly dwell on what this entails. First and foremost, the sampler should be a subtype of AbstractMCMC.AbstractSampler. Second, the stepping function of the MCMC algorithm must be made defined using AbstractMCMC.step and follow the structure below:

# First step
function AbstractMCMC.step{T<:AbstractMCMC.AbstractSampler}(
    rng::Random.AbstractRNG,
    model::AbstractMCMC.LogDensityModel,
    spl::T;
    kwargs...,
)
    [...]
    return transition, sample
end

# N+1 step
function AbstractMCMC.step{T<:AbstractMCMC.AbstractSampler}(
    rng::Random.AbstractRNG,
    model::AbstractMCMC.LogDensityModel,
    sampler::T,
    state;
    kwargs...,
) 
    [...]
    return transition, sample
end

There are several characteristics to note in these functions:

  • There must be two step functions:

    • A function that performs the first step and initializes the sampler.
    • A function that performs the following steps and takes an extra input, state, which carries the initialization information.
  • The functions must follow the displayed signatures.

  • The output of the functions must be a transition, the current state of the sampler, and a sample, what is saved to the MCMC chain.

The last requirement is that the transition must be structured with a field θ, which contains the values of the parameters of the model for said transition. This allows Turing to seamlessly extract the parameter values at each step of the chain when bundling the chains. Note that if the external sampler produces transitions that Turing cannot parse, the bundling of the samples will be different or fail.

For practical examples of how to adapt a sampling library to the AbstractMCMC interface, the readers can consult the following libraries:

Back to top