ENV["GKS_ENCODING"] = "utf-8" # Allows the use of unicode characters in Plots.jl
using Plots
using StatsPlots
using Turing
using Random
using Bijectors
# Set a seed.
Random.seed!(0)
# Define a strange model.
@model function gdemo(x)
~ InverseGamma(2, 3)
s² ~ Normal(0, sqrt(s²))
m = sin(m) + cos(m)
bumps = m + 5 * bumps
m for i in eachindex(x)
~ Normal(m, sqrt(s²))
x[i] end
return s², m
end
# Define our data points.
= [1.5, 2.0, 13.0, 2.1, 0.0]
x
# Set up the model call, sample from the prior.
= gdemo(x)
model
# Evaluate surface at coordinates.
evaluate(m1, m2) = logjoint(model, (m=m2, s²=invlink.(Ref(InverseGamma(2, 3)), m1)))
function plot_sampler(chain; label="")
# Extract values from chain.
= get(chain, [:s², :m, :lp])
val = link.(Ref(InverseGamma(2, 3)), val.s²)
ss = val.m
ms = val.lp
lps
# How many surface points to sample.
= 100
granularity
# Range start/stop points.
= 0.5
spread = minimum(ss) - spread * std(ss)
σ_start = maximum(ss) + spread * std(ss)
σ_stop = minimum(ms) - spread * std(ms)
μ_start = maximum(ms) + spread * std(ms)
μ_stop = collect(range(σ_start; stop=σ_stop, length=granularity))
σ_rng = collect(range(μ_start; stop=μ_stop, length=granularity))
μ_rng
# Make surface plot.
= surface(
p
σ_rng,
μ_rng,
evaluate;=(30, 65),
camera# ticks=nothing,
=false,
colorbar=:inferno,
color=label,
title
)
= 1:length(ms)
line_range
scatter3d!(
ss[line_range],
ms[line_range],
lps[line_range];=:viridis,
mc=collect(line_range),
marker_z=0,
msw=false,
legend=false,
colorbar=0.5,
alpha="σ",
xlabel="μ",
ylabel="Log probability",
zlabel=label,
title
)
return p
end;
Sampler Visualization
Introduction
The Code
For each sampler, we will use the same code to plot sampler paths. The block below loads the relevant libraries and defines a function for plotting the sampler’s trajectory across the posterior.
The Turing model definition used here is not especially practical, but it is designed in such a way as to produce visually interesting posterior surfaces to show how different samplers move along the distribution.
setprogress!(false)
Samplers
Gibbs
Gibbs sampling tends to exhibit a “jittery” trajectory. The example below combines HMC
and PG
sampling to traverse the posterior.
= sample(model, Gibbs(HMC(0.01, 5, :s²), PG(20, :m)), 1000)
c plot_sampler(c)
HMC
Hamiltonian Monte Carlo (HMC) sampling is a typical sampler to use, as it tends to be fairly good at converging in a efficient manner. It can often be tricky to set the correct parameters for this sampler however, and the NUTS
sampler is often easier to run if you don’t want to spend too much time fiddling with step size and and the number of steps to take. Note however that HMC
does not explore the positive values μ very well, likely due to the leapfrog and step size parameter settings.
= sample(model, HMC(0.01, 10), 1000)
c plot_sampler(c)
HMCDA
The HMCDA sampler is an implementation of the Hamiltonian Monte Carlo with Dual Averaging algorithm found in the paper “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” by Hoffman and Gelman (2011). The paper can be found on arXiv for the interested reader.
= sample(model, HMCDA(200, 0.65, 0.3), 1000)
c plot_sampler(c)
┌ Info: Found initial step size
└ ϵ = 0.4
MH
Metropolis-Hastings (MH) sampling is one of the earliest Markov Chain Monte Carlo methods. MH sampling does not “move” a lot, unlike many of the other samplers implemented in Turing. Typically a much longer chain is required to converge to an appropriate parameter estimate.
The plot below only uses 1,000 iterations of Metropolis-Hastings.
= sample(model, MH(), 1000)
c plot_sampler(c)
As you can see, the MH sampler doesn’t move parameter estimates very often.
NUTS
The No U-Turn Sampler (NUTS) is an implementation of the algorithm found in the paper “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” by Hoffman and Gelman (2011). The paper can be found on arXiv for the interested reader.
NUTS tends to be very good at traversing complex posteriors quickly.
= sample(model, NUTS(0.65), 1000)
c plot_sampler(c)
┌ Info: Found initial step size
└ ϵ = 0.2015625
The only parameter that needs to be set other than the number of iterations to run is the target acceptance rate. In the Hoffman and Gelman paper, they note that a target acceptance rate of 0.65 is typical.
Here is a plot showing a very high acceptance rate. Note that it appears to “stick” to a mode and is not particularly good at exploring the posterior as compared to the 0.65 target acceptance ratio case.
= sample(model, NUTS(0.95), 1000)
c plot_sampler(c)
┌ Info: Found initial step size
└ ϵ = 0.8
An exceptionally low acceptance rate will show very few moves on the posterior:
= sample(model, NUTS(0.2), 1000)
c plot_sampler(c)
┌ Info: Found initial step size
└ ϵ = 0.2
PG
The Particle Gibbs (PG) sampler is an implementation of an algorithm from the paper “Particle Markov chain Monte Carlo methods” by Andrieu, Doucet, and Holenstein (2010). The interested reader can learn more here.
The two parameters are the number of particles, and the number of iterations. The plot below shows the use of 20 particles.
= sample(model, PG(20), 1000)
c plot_sampler(c)
Next, we plot using 50 particles.
= sample(model, PG(50), 1000)
c plot_sampler(c)