# Import libraries.
using Turing, Random, LinearAlgebra
= 10
d @model function funnel()
~ Truncated(Normal(0, 3), -3, 3)
θ ~ MvNormal(zeros(d - 1), exp(θ) * I)
z return x ~ MvNormal(z, I)
end
funnel (generic function with 2 methods)
Turing
provides several wrapped samplers from external sampling libraries, e.g., HMC samplers from AdvancedHMC
. These wrappers allow new users to seamlessly sample statistical models without leaving Turing
However, these wrappers might only sometimes be complete, missing some functionality from the wrapped sampling library. Moreover, users might want to use samplers currently not wrapped within Turing
.
For these reasons, Turing
also makes running external samplers on Turing models easy without any necessary modifications or wrapping! Throughout, we will use a 10-dimensional Neal’s funnel as a running example::
# Import libraries.
using Turing, Random, LinearAlgebra
d = 10
@model function funnel()
θ ~ Truncated(Normal(0, 3), -3, 3)
z ~ MvNormal(zeros(d - 1), exp(θ) * I)
return x ~ MvNormal(z, I)
end
funnel (generic function with 2 methods)
Now we sample the model to generate some observations, which we can then condition on.
Users can use any sampler algorithm to sample this model if it follows the AbstractMCMC
API. Before discussing how this is done in practice, giving a high-level description of the process is interesting. Imagine that we created an instance of an external sampler that we will call spl
such that typeof(spl)<:AbstractMCMC.AbstractSampler
. In order to avoid type ambiguity within Turing, at the moment it is necessary to declare spl
as an external sampler to Turing espl = externalsampler(spl)
, where externalsampler(s::AbstractMCMC.AbstractSampler)
is a Turing function that types our external sampler adequately.
An excellent point to start to show how this is done in practice is by looking at the sampling library AdvancedMH
(AdvancedMH
’s GitHub) for Metropolis-Hastings (MH) methods. Let’s say we want to use a random walk Metropolis-Hastings sampler without specifying the proposal distributions. The code below constructs an MH sampler using a multivariate Gaussian distribution with zero mean and unit variance in d
dimensions as a random walk proposal.
MetropolisHastings{RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}}(RandomWalkProposal{false, ZeroMeanIsoNormal{Tuple{Base.OneTo{Int64}}}}(ZeroMeanIsoNormal(
dim: 10
μ: Zeros(10)
Σ: [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0]
)
))
Sampling is then as easy as:
Chains MCMC chain (10000×11×1 Array{Float64, 3}): Iterations = 1:1:10000 Number of chains = 1 Samples per chain = 10000 Wall duration = 3.91 seconds Compute duration = 3.91 seconds parameters = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9] internals = lp Summary Statistics parameters mean std mcse ess_bulk ess_tail rhat e ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯ θ -0.4176 0.7182 0.1100 40.2920 55.8183 1.0116 ⋯ z[1] -0.6081 0.5861 0.0655 77.9584 103.7683 1.0045 ⋯ z[2] 0.0008 0.5578 0.0633 74.5083 82.2725 1.0334 ⋯ z[3] -0.4018 0.6897 0.0999 50.9254 93.0228 1.0340 ⋯ z[4] -1.0382 0.7541 0.1011 53.8957 86.8961 1.0418 ⋯ z[5] -0.1245 0.5481 0.0623 75.4438 164.0067 1.0528 ⋯ z[6] -0.2217 0.5652 0.0539 106.1609 125.8279 1.0235 ⋯ z[7] -0.2563 0.6514 0.0716 82.6933 115.2301 1.0106 ⋯ z[8] 0.3367 0.7103 0.0995 51.3573 66.7357 1.0231 ⋯ z[9] -0.8584 0.5992 0.0691 74.8243 101.1256 1.0119 ⋯ 1 column omitted Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 θ -1.6422 -1.0084 -0.5659 -0.0089 1.1758 z[1] -1.7148 -0.9367 -0.7006 -0.2277 0.4835 z[2] -1.0076 -0.3579 0.0312 0.2740 1.0056 z[3] -1.5568 -0.7949 -0.5111 0.1487 0.8288 z[4] -2.7566 -1.5246 -0.9537 -0.5993 0.4874 z[5] -1.2258 -0.4706 -0.1471 0.2113 0.8501 z[6] -1.3661 -0.6166 -0.2917 0.1713 0.8643 z[7] -1.5185 -0.5944 -0.2379 0.1250 1.0352 z[8] -0.9293 -0.1104 0.1937 0.8960 1.6462 z[9] -2.0383 -1.3314 -0.8347 -0.4830 0.3263
As previously mentioned, the Turing wrappers can often limit the capabilities of the sampling libraries they wrap. AdvancedHMC
1 (AdvancedHMC
’s GitHub) is a clear example of this. A common practice when performing HMC is to provide an initial guess for the mass matrix. However, the native HMC sampler within Turing only allows the user to specify the type of the mass matrix despite the two options being possible within AdvancedHMC
. Thankfully, we can use Turing’s support for external samplers to define an HMC sampler with a custom mass matrix in AdvancedHMC
and then use it to sample our Turing model.
We will use the library Pathfinder
2 ((Pathfinder
’s GitHub)[https://github.com/mlcolab/Pathfinder.jl]) to construct our estimate of mass matrix. Pathfinder
is a variational inference algorithm that first finds the maximum a posteriori (MAP) estimate of a target posterior distribution and then uses the trace of the optimization to construct a sequence of multivariate normal approximations to the target distribution. In this process, Pathfinder
computes an estimate of the mass matrix the user can access.
The code below shows this can be done in practice.
using AdvancedHMC, Pathfinder
# Running pathfinder
draws = 1_000
result_multi = multipathfinder(model, draws; nruns=8)
# Estimating the metric
inv_metric = result_multi.pathfinder_results[1].fit_distribution.Σ
metric = DenseEuclideanMetric(Matrix(inv_metric))
# Creating an AdvancedHMC NUTS sampler with the custom metric.
n_adapts = 1000 # Number of adaptation steps
tap = 0.9 # Large target acceptance probability to deal with the funnel structure of the posterior
nuts = AdvancedHMC.NUTS(tap; metric=metric)
# Sample
chain = sample(model, externalsampler(nuts), 10_000; n_adapts=1_000)
┌ Warning: Pareto shape k = 1.2 > 1. Corresponding importance sampling estimates are likely to be unstable and are unlikely to converge with additional samples.
└ @ PSIS ~/.julia/packages/PSIS/jwznT/src/core.jl:312
[ Info: Found initial step size 2.384185791015625e-8
Chains MCMC chain (10000×23×1 Array{Float64, 3}): Iterations = 1:1:10000 Number of chains = 1 Samples per chain = 10000 Wall duration = 7.56 seconds Compute duration = 7.56 seconds parameters = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9] internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, is_adapt Summary Statistics parameters mean std mcse ess_bulk ess_tail rhat ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯ θ -0.8409 1.0433 0.0269 1441.8300 1479.1493 1.0004 ⋯ z[1] -0.4402 0.6308 0.0088 5622.1418 6273.3885 1.0001 ⋯ z[2] -0.0454 0.5633 0.0073 5445.9145 5944.6813 1.0002 ⋯ z[3] -0.3002 0.6005 0.0096 4007.2789 6096.9554 1.0004 ⋯ z[4] -0.8207 0.7680 0.0183 1237.4908 478.7786 1.0004 ⋯ z[5] -0.1544 0.6313 0.0283 779.6861 221.6092 1.0022 ⋯ z[6] -0.0913 0.5649 0.0059 9340.9125 6464.1877 1.0015 ⋯ z[7] -0.1709 0.6035 0.0131 1891.6268 6498.1208 1.0000 ⋯ z[8] 0.2805 0.6015 0.0082 5123.7372 6047.6347 1.0003 ⋯ z[9] -0.5942 0.7234 0.0315 641.9286 207.8211 1.0040 ⋯ 1 column omitted Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 θ -2.8030 -1.6179 -0.7915 -0.0608 1.0973 z[1] -1.8605 -0.8087 -0.3616 -0.0120 0.6335 z[2] -1.1918 -0.3943 -0.0341 0.2996 1.1148 z[3] -1.6240 -0.6559 -0.2462 0.0843 0.7878 z[4] -2.6249 -1.2741 -0.6887 -0.2604 0.2927 z[5] -1.4725 -0.5102 -0.1369 0.2029 1.1881 z[6] -1.2632 -0.4190 -0.0828 0.2477 1.0423 z[7] -1.4907 -0.5087 -0.1412 0.2055 0.9576 z[8] -0.8231 -0.0999 0.2294 0.6266 1.6269 z[9] -2.1839 -1.0157 -0.5040 -0.1196 0.6641
So far we have used Turing’s support for external samplers to go beyond the capabilities of the wrappers. We want to use this support to employ a sampler not supported within Turing’s ecosystem yet. We will use the recently developed Micro-Cannoncial Hamiltonian Monte Carlo (MCHMC) sampler to showcase this. MCHMC[3,4] ((MCHMC’s GitHub)[https://github.com/JaimeRZP/MicroCanonicalHMC.jl]) is HMC sampler that uses one single Hamiltonian energy level to explore the whole parameter space. This is achieved by simulating the dynamics of a microcanonical Hamiltonian with an additional noise term to ensure ergodicity.
Using this as well as other inference methods outside the Turing ecosystem is as simple as executing the code shown below:
using MicroCanonicalHMC
# Create MCHMC sampler
n_adapts = 1_000 # adaptation steps
tev = 0.01 # target energy variance
mchmc = MCHMC(n_adapts, tev; adaptive=true)
# Sample
chain = sample(model, externalsampler(mchmc), 10_000)
[ Info: Tuning eps ⏳
[ Info: Tuning L ⏳
[ Info: Tuning sigma ⏳
Tuning: 0%|▏ | ETA: 0:08:26
ϵ: 1.2717585763102905
L: 3.1622776601683795
dE/d: -0.018790527040866322
Tuning: 100%|███████████████████████████████████████████| Time: 0:00:02
ϵ: 1.3039025765094499
L: 0.8149131163355419
dE/d: -0.01051374839882122
Chains MCMC chain (10000×11×1 Array{Float64, 3}): Iterations = 1:1:10000 Number of chains = 1 Samples per chain = 10000 Wall duration = 5.36 seconds Compute duration = 5.36 seconds parameters = θ, z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8], z[9] internals = lp Summary Statistics parameters mean std mcse ess_bulk ess_tail rhat e ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯ θ -0.7067 1.0279 0.0869 144.9822 383.0765 1.0148 ⋯ z[1] -0.5201 0.6525 0.0424 249.4998 374.7171 1.0240 ⋯ z[2] -0.0172 0.5771 0.0324 326.7777 462.3836 1.0110 ⋯ z[3] -0.2942 0.6133 0.0363 301.9668 394.4349 1.0013 ⋯ z[4] -0.8708 0.7285 0.0612 153.3847 295.1226 1.0171 ⋯ z[5] -0.1968 0.5882 0.0332 320.2821 480.0108 1.0031 ⋯ z[6] -0.1019 0.5898 0.0333 319.1260 545.6729 1.0041 ⋯ z[7] -0.1915 0.6326 0.0421 234.1260 378.9084 1.0070 ⋯ z[8] 0.2917 0.6042 0.0463 172.8050 373.1418 1.0077 ⋯ z[9] -0.7069 0.7259 0.0550 194.4545 413.4434 1.0155 ⋯ 1 column omitted Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 θ -2.6944 -1.4615 -0.6539 0.0448 1.1947 z[1] -2.0330 -0.9009 -0.4491 -0.0797 0.6025 z[2] -1.2561 -0.3552 -0.0039 0.3246 1.1575 z[3] -1.6158 -0.6758 -0.2515 0.1149 0.8399 z[4] -2.5123 -1.3494 -0.7835 -0.3163 0.3028 z[5] -1.3814 -0.5705 -0.1743 0.1708 0.9538 z[6] -1.3138 -0.4665 -0.0967 0.2682 1.0774 z[7] -1.5556 -0.5547 -0.1537 0.2050 1.0114 z[8] -0.8700 -0.0954 0.2595 0.6767 1.5396 z[9] -2.4024 -1.1294 -0.5800 -0.1897 0.4207
The only requirement to work with externalsampler
is that the provided sampler
must implement the AbstractMCMC.jl-interface [INSERT LINK] for a model
of type AbstractMCMC.LogDensityModel
[INSERT LINK].
As previously stated, in order to use external sampling libraries within Turing
they must follow the AbstractMCMC
API. In this section, we will briefly dwell on what this entails. First and foremost, the sampler should be a subtype of AbstractMCMC.AbstractSampler
. Second, the stepping function of the MCMC algorithm must be made defined using AbstractMCMC.step
and follow the structure below:
# First step
function AbstractMCMC.step{T<:AbstractMCMC.AbstractSampler}(
rng::Random.AbstractRNG,
model::AbstractMCMC.LogDensityModel,
spl::T;
kwargs...,
)
[...]
return transition, sample
end
# N+1 step
function AbstractMCMC.step{T<:AbstractMCMC.AbstractSampler}(
rng::Random.AbstractRNG,
model::AbstractMCMC.LogDensityModel,
sampler::T,
state;
kwargs...,
)
[...]
return transition, sample
end
There are several characteristics to note in these functions:
There must be two step
functions:
state
, which carries the initialization information.The functions must follow the displayed signatures.
The output of the functions must be a transition, the current state of the sampler, and a sample, what is saved to the MCMC chain.
The last requirement is that the transition must be structured with a field θ
, which contains the values of the parameters of the model for said transition. This allows Turing
to seamlessly extract the parameter values at each step of the chain when bundling the chains. Note that if the external sampler produces transitions that Turing cannot parse, the bundling of the samples will be different or fail.
For practical examples of how to adapt a sampling library to the AbstractMCMC
interface, the readers can consult the following libraries:
Xu et al., AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms, 2019↩︎
Zhang et al., Pathfinder: Parallel quasi-Newton variational inference, 2021↩︎
Robnik et al, Microcanonical Hamiltonian Monte Carlo, 2022↩︎
Robnik and Seljak, Langevine Hamiltonian Monte Carlo, 2023↩︎