Guide
Basics
Introduction
A probabilistic program is Julia code wrapped in a @model
macro. It can use arbitrary Julia code, but to ensure correctness of inference it should not have external effects or modify global state. Stack-allocated variables are safe, but mutable heap-allocated objects may lead to subtle bugs when using task copying. By default Libtask deepcopies Array
and Dict
objects when copying task to avoid bugs with data stored in mutable structure in Turing models.
To specify distributions of random variables, Turing programs should use the ~
notation:
x ~ distr
where x
is a symbol and distr
is a distribution. If x
is undefined in the model function, inside the probabilistic program, this puts a random variable named x
, distributed according to distr
, in the current scope. distr
can be a value of any type that implements rand(distr)
, which samples a value from the distribution distr
. If x
is defined, this is used for conditioning in a style similar to Anglican (another PPL). In this case, x
is an observed value, assumed to have been drawn from the distribution distr
. The likelihood is computed using logpdf(distr,y)
. The observe statements should be arranged so that every possible run traverses all of them in exactly the same order. This is equivalent to demanding that they are not placed inside stochastic control flow.
Available inference methods include Importance Sampling (IS), Sequential Monte Carlo (SMC), Particle Gibbs (PG), Hamiltonian Monte Carlo (HMC), Hamiltonian Monte Carlo with Dual Averaging (HMCDA) and The No-U-Turn Sampler (NUTS).
Simple Gaussian Demo
Below is a simple Gaussian demo illustrate the basic usage of Turing.jl.
# Import packages.
using Turing
using StatsPlots
# Define a simple Normal model with unknown mean and variance.
@model function gdemo(x, y)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
x ~ Normal(m, sqrt(s²))
return y ~ Normal(m, sqrt(s²))
end
gdemo (generic function with 2 methods)
Note: As a sanity check, the prior expectation of s²
is mean(InverseGamma(2, 3)) = 3/(2 - 1) = 3
and the prior expectation of m
is 0. This can be easily checked using Prior
:
p1 = sample(gdemo(missing, missing), Prior(), 100000)
Chains MCMC chain (100000×5×1 Array{Float64, 3}):
Iterations = 1:1:100000
Number of chains = 1
Samples per chain = 100000
Wall duration = 2.1 seconds
Compute duration = 2.1 seconds
parameters = s², m, x, y
internals = lp
Summary Statistics
parameters mean std mcse ess_bulk ess_tail
rha ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Flo
at6 ⋯
s² 2.9424 5.6793 0.0180 100347.7019 99440.6204 1.
000 ⋯
m 0.0003 1.7101 0.0054 99890.2582 98065.6160 1.
000 ⋯
x 0.0139 2.4361 0.0077 99755.2179 98064.8845 1.
000 ⋯
y -0.0063 2.4221 0.0077 99001.9384 98925.3975 1.
000 ⋯
2 columns om
itted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
s² 0.5381 1.1138 1.7825 3.1037 12.1347
m -3.3473 -0.9125 -0.0049 0.9035 3.3792
x -4.7745 -1.2786 0.0120 1.2816 4.8070
y -4.7919 -1.2892 -0.0003 1.2713 4.7882
We can perform inference by using the sample
function, the first argument of which is our probabilistic program and the second of which is a sampler. More information on each sampler is located in the API.
# Run sampler, collect results.
c1 = sample(gdemo(1.5, 2), SMC(), 1000)
c2 = sample(gdemo(1.5, 2), PG(10), 1000)
c3 = sample(gdemo(1.5, 2), HMC(0.1, 5), 1000)
c4 = sample(gdemo(1.5, 2), Gibbs(PG(10, :m), HMC(0.1, 5, :s²)), 1000)
c5 = sample(gdemo(1.5, 2), HMCDA(0.15, 0.65), 1000)
c6 = sample(gdemo(1.5, 2), NUTS(0.65), 1000)
Chains MCMC chain (1000×14×1 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 1
Samples per chain = 1000
Wall duration = 2.06 seconds
Compute duration = 2.06 seconds
parameters = s², m
internals = lp, n_steps, is_accept, acceptance_rate, log_density, h
amiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error,
tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat
e ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64
⋯
s² 1.9289 1.4467 0.0604 565.2070 633.2511 1.0015
⋯
m 1.1824 0.7217 0.0298 585.7603 655.4807 1.0000
⋯
1 column om
itted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
s² 0.5766 1.0328 1.5063 2.3328 5.8806
m -0.2834 0.6983 1.1991 1.6599 2.5946
The MCMCChains
module (which is re-exported by Turing) provides plotting tools for the Chain
objects returned by a sample
function. See the MCMCChains repository for more information on the suite of tools available for diagnosing MCMC chains.
# Summarise results
describe(c3)
# Plot results
plot(c3)
savefig("gdemo-plot.png")
The arguments for each sampler are:
- SMC: number of particles.
- PG: number of particles, number of iterations.
- HMC: leapfrog step size, leapfrog step numbers.
- Gibbs: component sampler 1, component sampler 2, ...
- HMCDA: total leapfrog length, target accept ratio.
- NUTS: number of adaptation steps (optional), target accept ratio.
For detailed information on the samplers, please review Turing.jl's API documentation.
Modelling Syntax Explained
Using this syntax, a probabilistic model is defined in Turing. The model function generated by Turing can then be used to condition the model onto data. Subsequently, the sample function can be used to generate samples from the posterior distribution.
In the following example, the defined model is conditioned to the data (arg1 = 1, arg2 = 2) by passing (1, 2) to the model function.
@model function model_name(arg_1, arg_2)
return ...
end
The conditioned model can then be passed onto the sample function to run posterior inference.
model_func = model_name(1, 2)
chn = sample(model_func, HMC(..)) # Perform inference by sampling using HMC.
The returned chain contains samples of the variables in the model.
var_1 = mean(chn[:var_1]) # Taking the mean of a variable named var_1.
The key (:var_1
) can be a Symbol
or a String
. For example, to fetch x[1]
, one can use chn[Symbol("x[1]")]
or chn["x[1]"]
.
If you want to retrieve all parameters associated with a specific symbol, you can use group
. As an example, if you have the
parameters "x[1]"
, "x[2]"
, and "x[3]"
, calling group(chn, :x)
or group(chn, "x")
will return a new chain with only "x[1]"
, "x[2]"
, and "x[3]"
.
Turing does not have a declarative form. More generally, the order in which you place the lines of a @model
macro matters. For example, the following example works:
# Define a simple Normal model with unknown mean and variance.
@model function model_function(y)
s ~ Poisson(1)
y ~ Normal(s, 1)
return y
end
sample(model_function(10), SMC(), 100)
Chains MCMC chain (100×3×1 Array{Float64, 3}):
Log evidence = -22.4209907336435
Iterations = 1:1:100
Number of chains = 1
Samples per chain = 100
Wall duration = 1.84 seconds
Compute duration = 1.84 seconds
parameters = s
internals = lp, weight
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat
e ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64
⋯
s 4.0000 0.0000 NaN NaN NaN NaN
⋯
1 column om
itted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
s 4.0000 4.0000 4.0000 4.0000 4.0000
But if we switch the s ~ Poisson(1)
and y ~ Normal(s, 1)
lines, the model will no longer sample correctly:
# Define a simple Normal model with unknown mean and variance.
@model function model_function(y)
y ~ Normal(s, 1)
s ~ Poisson(1)
return y
end
sample(model_function(10), SMC(), 100)
Sampling Multiple Chains
Turing supports distributed and threaded parallel sampling. To do so, call sample(model, sampler, parallel_type, n, n_chains)
, where parallel_type
can be either MCMCThreads()
or MCMCDistributed()
for thread and parallel sampling, respectively.
Having multiple chains in the same object is valuable for evaluating convergence. Some diagnostic functions like gelmandiag
require multiple chains.
If you do not want parallelism or are on an older version Julia, you can sample multiple chains with the mapreduce
function:
# Replace num_chains below with however many chains you wish to sample.
chains = mapreduce(c -> sample(model_fun, sampler, 1000), chainscat, 1:num_chains)
The chains
variable now contains a Chains
object which can be indexed by chain. To pull out the first chain from the chains
object, use chains[:,:,1]
. The method is the same if you use either of the below parallel sampling methods.
Multithreaded sampling
If you wish to perform multithreaded sampling and are running Julia 1.3 or greater, you can call sample
with the following signature:
using Turing
@model function gdemo(x)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s²))
end
end
model = gdemo([1.5, 2.0])
# Sample four chains using multiple threads, each with 1000 samples.
sample(model, NUTS(), MCMCThreads(), 1000, 4)
Be aware that Turing cannot add threads for you -- you must have started your Julia instance with multiple threads to experience any kind of parallelism. See the Julia documentation for details on how to achieve this.
Distributed sampling
To perform distributed sampling (using multiple processes), you must first import Distributed
.
Process parallel sampling can be done like so:
# Load Distributed to add processes and the @everywhere macro.
using Distributed
# Load Turing.
using Turing
# Add four processes to use for sampling.
addprocs(4; exeflags="--project=$(Base.active_project())")
# Initialize everything on all the processes.
# Note: Make sure to do this after you've already loaded Turing,
# so each process does not have to precompile.
# Parallel sampling may fail silently if you do not do this.
@everywhere using Turing
# Define a model on all processes.
@everywhere @model function gdemo(x)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s²))
end
end
# Declare the model instance everywhere.
@everywhere model = gdemo([1.5, 2.0])
# Sample four chains using multiple processes, each with 1000 samples.
sample(model, NUTS(), MCMCDistributed(), 1000, 4)
Sampling from an Unconditional Distribution (The Prior)
Turing allows you to sample from a declared model's prior. If you wish to draw a chain from the prior to inspect your prior distributions, you can simply run
chain = sample(model, Prior(), n_samples)
You can also run your model (as if it were a function) from the prior distribution, by calling the model without specifying inputs or a sampler. In the below example, we specify a gdemo
model which returns two variables, x
and y
. The model includes x
and y
as arguments, but calling the function without passing in x
or y
means that Turing's compiler will assume they are missing values to draw from the relevant distribution. The return
statement is necessary to retrieve the sampled x
and y
values.
Assign the function with missing
inputs to a variable, and Turing will produce a sample from the prior distribution.
@model function gdemo(x, y)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
x ~ Normal(m, sqrt(s²))
y ~ Normal(m, sqrt(s²))
return x, y
end
gdemo (generic function with 2 methods)
Assign the function with missing
inputs to a variable, and Turing will produce a sample from the prior distribution.
# Samples from p(x,y)
g_prior_sample = gdemo(missing, missing)
g_prior_sample()
(0.1741213188090317, -0.5141306911530534)
Sampling from a Conditional Distribution (The Posterior)
Treating observations as random variables
Inputs to the model that have a value missing
are treated as parameters, aka random variables, to be estimated/sampled. This can be useful if you want to simulate draws for that parameter, or if you are sampling from a conditional distribution. Turing supports the following syntax:
@model function gdemo(x, ::Type{T}=Float64) where {T}
if x === missing
# Initialize `x` if missing
x = Vector{T}(undef, 2)
end
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s²))
end
end
# Construct a model with x = missing
model = gdemo(missing)
c = sample(model, HMC(0.01, 5), 500)
Chains MCMC chain (500×14×1 Array{Float64, 3}):
Iterations = 1:1:500
Number of chains = 1
Samples per chain = 500
Wall duration = 1.86 seconds
Compute duration = 1.86 seconds
parameters = s², m, x[1], x[2]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, h
amiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, no
m_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat
e ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64
⋯
s² 0.7271 0.2249 0.0842 7.6115 10.8946 1.0866
⋯
m -0.2311 0.2300 0.0543 18.7223 38.4238 1.0040
⋯
x[1] -0.0177 0.3176 0.2204 2.1975 27.9147 1.3983
⋯
x[2] -0.6931 0.2532 0.0768 11.8370 28.3523 1.0758
⋯
1 column om
itted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
s² 0.2867 0.5873 0.7466 0.8790 1.1814
m -0.6826 -0.4042 -0.2247 -0.0424 0.1559
x[1] -0.5122 -0.2953 -0.0706 0.2731 0.4980
x[2] -1.1639 -0.8739 -0.6643 -0.4794 -0.2824
Note the need to initialize x
when missing since we are iterating over its elements later in the model. The generated values for x
can be extracted from the Chains
object using c[:x]
.
Turing also supports mixed missing
and non-missing
values in x
, where the missing ones will be treated as random variables to be sampled while the others get treated as observations. For example:
@model function gdemo(x)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s²))
end
end
# x[1] is a parameter, but x[2] is an observation
model = gdemo([missing, 2.4])
c = sample(model, HMC(0.01, 5), 500)
Chains MCMC chain (500×13×1 Array{Float64, 3}):
Iterations = 1:1:500
Number of chains = 1
Samples per chain = 500
Wall duration = 1.87 seconds
Compute duration = 1.87 seconds
parameters = s², m, x[1]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, h
amiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, no
m_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat
e ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64
⋯
s² 3.1704 1.7342 1.4214 1.5735 11.1270 1.7116
⋯
m -0.5925 0.4070 0.3487 1.4041 17.9775 1.9386
⋯
x[1] -1.4030 0.6071 0.5276 1.4381 22.5542 1.9377
⋯
1 column om
itted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
s² 0.7774 1.4557 3.1170 4.4699 6.6135
m -1.2511 -0.9028 -0.6544 -0.3092 0.2459
x[1] -2.4749 -1.8908 -1.2775 -0.9157 -0.5026
Default Values
Arguments to Turing models can have default values much like how default values work in normal Julia functions. For instance, the following will assign missing
to x
and treat it as a random variable. If the default value is not missing
, x
will be assigned that value and will be treated as an observation instead.
using Turing
@model function generative(x=missing, ::Type{T}=Float64) where {T<:Real}
if x === missing
# Initialize x when missing
x = Vector{T}(undef, 10)
end
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in 1:length(x)
x[i] ~ Normal(m, sqrt(s²))
end
return s², m
end
m = generative()
chain = sample(m, HMC(0.01, 5), 1000)
Chains MCMC chain (1000×22×1 Array{Float64, 3}):
Iterations = 1:1:1000
Number of chains = 1
Samples per chain = 1000
Wall duration = 3.48 seconds
Compute duration = 3.48 seconds
parameters = s², m, x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
x[9], x[10]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, h
amiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, no
m_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat
e ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64
⋯
s² 1.1119 0.5207 0.2842 3.1801 21.1090 1.4925
⋯
m 0.7230 0.3327 0.1126 9.5545 20.4148 1.0748
⋯
x[1] 1.1839 0.5516 0.3449 2.8437 21.5142 1.8318
⋯
x[2] -0.0426 1.1914 0.7555 2.5746 14.5364 2.0000
⋯
x[3] 2.2109 0.4282 0.2460 3.3195 21.2188 1.4879
⋯
x[4] 0.7597 0.2948 0.1332 5.3609 38.9687 1.1695
⋯
x[5] 0.1832 0.4150 0.1246 11.7747 18.1758 1.0769
⋯
x[6] 0.1954 0.4584 0.2712 2.9072 20.2899 1.8015
⋯
x[7] 1.1014 0.2734 0.0862 10.2761 33.9279 1.0363
⋯
x[8] 0.8650 0.7467 0.4947 2.6025 24.3852 2.1000
⋯
x[9] 0.7267 0.3944 0.1444 7.4158 15.8452 1.0546
⋯
x[10] 0.9286 0.3653 0.1516 5.9118 21.5974 1.2243
⋯
1 column om
itted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
s² 0.4383 0.7739 1.0078 1.3546 2.6791
m 0.0607 0.4999 0.7738 0.9723 1.2585
x[1] 0.3118 0.7075 1.1020 1.7223 2.0737
x[2] -1.5459 -0.8948 -0.6965 1.3969 2.2193
x[3] 1.4374 1.8335 2.2466 2.6105 2.8625
x[4] 0.2627 0.5428 0.7215 0.9632 1.3350
x[5] -0.4876 -0.1422 0.1458 0.4749 1.0366
x[6] -0.4925 -0.1268 0.0906 0.4979 1.1674
x[7] 0.5612 0.9157 1.0847 1.2993 1.6003
x[8] -0.1609 0.1762 0.6902 1.5329 2.0586
x[9] 0.0375 0.3880 0.7529 1.0318 1.5096
x[10] 0.1622 0.6578 0.9234 1.2332 1.5525
Access Values inside Chain
You can access the values inside a chain several ways:
- Turn them into a
DataFrame
object - Use their raw
AxisArray
form - Create a three-dimensional
Array
object
For example, let c
be a Chain
:
DataFrame(c)
convertsc
to aDataFrame
,c.value
retrieves the values insidec
as anAxisArray
, andc.value.data
retrieves the values insidec
as a 3DArray
.
Variable Types and Type Parameters
The element type of a vector (or matrix) of random variables should match the eltype
of the its prior distribution, <: Integer
for discrete distributions and <: AbstractFloat
for continuous distributions. Moreover, if the continuous random variable is to be sampled using a Hamiltonian sampler, the vector's element type needs to either be:
Real
to enable auto-differentiation through the model which uses special number types that are sub-types ofReal
, or- Some type parameter
T
defined in the model header using the type parameter syntax, e.g.function gdemo(x, ::Type{T} = Float64) where {T}
. Similarly, when using a particle sampler, the Julia variable used should either be: - An
Array
, or - An instance of some type parameter
T
defined in the model header using the type parameter syntax, e.g.function gdemo(x, ::Type{T} = Vector{Float64}) where {T}
.
Querying Probabilities from Model or Chain
Turing offers three functions: loglikelihood
, logprior
, and logjoint
to query the log-likelihood, log-prior, and log-joint probabilities of a model, respectively.
Let's look at a simple model called gdemo
:
@model function gdemo0()
s ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s))
return x ~ Normal(m, sqrt(s))
end
gdemo0 (generic function with 2 methods)
If we observe x to be 1.0, we can condition the model on this datum using the condition
syntax:
model = gdemo0() | (x=1.0,)
DynamicPPL.Model{typeof(Main.var"##WeaveSandBox#413".gdemo0), (), (), (), T
uple{}, Tuple{}, DynamicPPL.ConditionContext{@NamedTuple{x::Float64}, Dynam
icPPL.DefaultContext}}(Main.var"##WeaveSandBox#413".gdemo0, NamedTuple(), N
amedTuple(), ConditionContext((x = 1.0,), DynamicPPL.DefaultContext()))
Now, let's compute the log-likelihood of the observation given specific values of the model parameters, s
and m
:
loglikelihood(model, (s=1.0, m=1.0))
-0.9189385332046728
We can easily verify that value in this case:
logpdf(Normal(1.0, 1.0), 1.0)
-0.9189385332046728
We can also compute the log-prior probability of the model for the same values of s and m:
logprior(model, (s=1.0, m=1.0))
-2.221713955868453
logpdf(InverseGamma(2, 3), 1.0) + logpdf(Normal(0, sqrt(1.0)), 1.0)
-2.221713955868453
Finally, we can compute the log-joint probability of the model parameters and data:
logjoint(model, (s=1.0, m=1.0))
-3.1406524890731258
logpdf(Normal(1.0, 1.0), 1.0) +
logpdf(InverseGamma(2, 3), 1.0) +
logpdf(Normal(0, sqrt(1.0)), 1.0)
-3.1406524890731258
Querying with Chains
object is easy as well:
chn = sample(model, Prior(), 10)
Chains MCMC chain (10×3×1 Array{Float64, 3}):
Iterations = 1:1:10
Number of chains = 1
Samples per chain = 10
Wall duration = 0.39 seconds
Compute duration = 0.39 seconds
parameters = s, m
internals = lp
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat
e ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64
⋯
s 2.3521 1.0661 0.4174 4.9687 10.0000 1.4156
⋯
m -0.2634 1.9127 0.6048 10.0000 10.0000 0.9681
⋯
1 column om
itted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
s 1.1649 1.6899 2.0003 3.2842 4.0077
m -2.1049 -1.9263 -0.6289 0.9718 3.0625
loglikelihood(model, chn)
10×1 Matrix{Float64}:
-1.152512581810826
-3.209564317808394
-1.2822047083624553
-1.1774957331222868
-2.784582448464323
-3.5390362387698215
-3.015852527660817
-3.8036178159082445
-1.744721992326117
-2.431462231454251
Maximum likelihood and maximum a posterior estimates
Turing provides support for two mode estimation techniques, maximum likelihood estimation (MLE) and maximum a posterior (MAP) estimation. Optimization is performed by the Optim.jl package. Mode estimation is currently a optional tool, and will not be available to you unless you have manually installed Optim and loaded the package with a using
statement. To install Optim, run import Pkg; Pkg.add("Optim")
.
Mode estimation only works when all model parameters are continuous -- discrete parameters cannot be estimated with MLE/MAP as of yet.
To understand how mode estimation works, let us first load Turing and Optim to enable mode estimation, and then declare a model:
# Note that loading Optim explicitly is required for mode estimation to function,
# as Turing does not load the opimization suite unless Optim is loaded as well.
using Turing
using Optim
@model function gdemo(x)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s²))
end
end
gdemo (generic function with 6 methods)
Once the model is defined, we can construct a model instance as we normally would:
# Create some data to pass to the model.
data = [1.5, 2.0]
# Instantiate the gdemo model with our data.
model = gdemo(data)
DynamicPPL.Model{typeof(Main.var"##WeaveSandBox#413".gdemo), (:x,), (), (),
Tuple{Vector{Float64}}, Tuple{}, DynamicPPL.DefaultContext}(Main.var"##Wea
veSandBox#413".gdemo, (x = [1.5, 2.0],), NamedTuple(), DynamicPPL.DefaultCo
ntext())
Mode estimation is typically quick and easy at this point. Turing extends the function Optim.optimize
and accepts the structs MLE()
or MAP()
, which inform Turing whether to provide an MLE or MAP estimate, respectively. By default, the LBFGS optimizer is used, though this can be changed. Basic usage is:
# Generate a MLE estimate.
mle_estimate = optimize(model, MLE())
# Generate a MAP estimate.
map_estimate = optimize(model, MAP())
ModeResult with maximized lp of -4.62
[0.9074074074073607, 1.1666666666663343]
If you wish to change to a different optimizer, such as NelderMead
, simply place your optimizer in the third argument slot:
# Use NelderMead
mle_estimate = optimize(model, MLE(), NelderMead())
# Use SimulatedAnnealing
mle_estimate = optimize(model, MLE(), SimulatedAnnealing())
# Use ParticleSwarm
mle_estimate = optimize(model, MLE(), ParticleSwarm())
# Use Newton
mle_estimate = optimize(model, MLE(), Newton())
# Use AcceleratedGradientDescent
mle_estimate = optimize(model, MLE(), AcceleratedGradientDescent())
Some methods may have trouble calculating the mode because not enough iterations were allowed, or the target function moved upwards between function calls. Turing will warn you if Optim fails to converge by running Optim.converge
. A typical solution to this might be to add more iterations, or allow the optimizer to increase between function iterations:
# Increase the iterations and allow function eval to increase between calls.
mle_estimate = optimize(
model, MLE(), Newton(), Optim.Options(; iterations=10_000, allow_f_increases=true)
)
More options for Optim are available here.
Analyzing your mode estimate
Turing extends several methods from StatsBase
that can be used to analyze your mode estimation results. Methods implemented include vcov
, informationmatrix
, coeftable
, params
, and coef
, among others.
For example, let's examine our ML estimate from above using coeftable
:
# Import StatsBase to use it's statistical methods.
using StatsBase
# Print out the coefficient table.
coeftable(mle_estimate)
─────────────────────────────
estimate stderror tstat
─────────────────────────────
s 0.0625 0.0625 1.0
m 1.75 0.176777 9.8995
─────────────────────────────
Standard errors are calculated from the Fisher information matrix (inverse Hessian of the log likelihood or log joint). t-statistics will be familiar to frequentist statisticians. Warning -- standard errors calculated in this way may not always be appropriate for MAP estimates, so please be cautious in interpreting them.
Sampling with the MAP/MLE as initial states
You can begin sampling your chain from an MLE/MAP estimate by extracting the vector of parameter values and providing it to the sample
function with the keyword initial_params
. For example, here is how to sample from the full posterior using the MAP estimate as the starting point:
# Generate an MAP estimate.
map_estimate = optimize(model, MAP())
# Sample with the MAP estimate as the starting point.
chain = sample(model, NUTS(), 1_000; initial_params=map_estimate.values.array)
Beyond the Basics
Compositional Sampling Using Gibbs
Turing.jl provides a Gibbs interface to combine different samplers. For example, one can combine an HMC
sampler with a PG
sampler to run inference for different parameters in a single model as below.
@model function simple_choice(xs)
p ~ Beta(2, 2)
z ~ Bernoulli(p)
for i in 1:length(xs)
if z == 1
xs[i] ~ Normal(0, 1)
else
xs[i] ~ Normal(2, 1)
end
end
end
simple_choice_f = simple_choice([1.5, 2.0, 0.3])
chn = sample(simple_choice_f, Gibbs(HMC(0.2, 3, :p), PG(20, :z)), 1000)
Chains MCMC chain (1000×3×1 Array{Float64, 3}):
Iterations = 1:1:1000
Number of chains = 1
Samples per chain = 1000
Wall duration = 15.52 seconds
Compute duration = 15.52 seconds
parameters = p, z
internals = lp
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat
e ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64
⋯
p 0.4408 0.2047 0.0206 98.8899 136.7330 1.0129
⋯
z 0.1920 0.3941 0.0221 317.1798 NaN 0.9999
⋯
1 column om
itted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
p 0.1122 0.2779 0.4240 0.6007 0.8589
z 0.0000 0.0000 0.0000 0.0000 1.0000
The Gibbs
sampler can be used to specify unique automatic differentiation backends for different variable spaces. Please see the Automatic Differentiation article for more.
For more details of compositional sampling in Turing.jl, please check the corresponding paper.
Working with filldist and arraydist
Turing provides filldist(dist::Distribution, n::Int)
and arraydist(dists::AbstractVector{<:Distribution})
as a simplified interface to construct product distributions, e.g., to model a set of variables that share the same structure but vary by group.
Constructing product distributions with filldist
The function filldist
provides a general interface to construct product distributions over distributions of the same type and parameterisation.
Note that, in contrast to the product distribution interface provided by Distributions.jl (Product
), filldist
supports product distributions over univariate or multivariate distributions.
Example usage:
@model function demo(x, g)
k = length(unique(g))
a ~ filldist(Exponential(), k) # = Product(fill(Exponential(), k))
mu = a[g]
return x .~ Normal.(mu)
end
demo (generic function with 2 methods)
Constructing product distributions with arraydist
The function arraydist
provides a general interface to construct product distributions over distributions of varying type and parameterisation.
Note that in contrast to the product distribution interface provided by Distributions.jl (Product
), arraydist
supports product distributions over univariate or multivariate distributions.
Example usage:
@model function demo(x, g)
k = length(unique(g))
a ~ arraydist([Exponential(i) for i in 1:k])
mu = a[g]
return x .~ Normal.(mu)
end
demo (generic function with 2 methods)
Working with MCMCChains.jl
Turing.jl wraps its samples using MCMCChains.Chain
so that all the functions working for MCMCChains.Chain
can be re-used in Turing.jl. Two typical functions are MCMCChains.describe
and MCMCChains.plot
, which can be used as follows for an obtained chain chn
. For more information on MCMCChains
, please see the GitHub repository.
describe(chn) # Lists statistics of the samples.
plot(chn) # Plots statistics of the samples.
There are numerous functions in addition to describe
and plot
in the MCMCChains
package, such as those used in convergence diagnostics. For more information on the package, please see the GitHub repository.
Changing Default Settings
Some of Turing.jl's default settings can be changed for better usage.
AD Chunk Size
ForwardDiff (Turing's default AD backend) uses forward-mode chunk-wise AD. The chunk size can be set manually by setchunksize(new_chunk_size)
.
AD Backend
Turing supports four packages of automatic differentiation (AD) in the back end during sampling. The default AD backend is ForwardDiff for forward-mode AD. Three reverse-mode AD backends are also supported, namely Tracker, Zygote and ReverseDiff. Zygote
and ReverseDiff
are supported optionally if explicitly loaded by the user with using Zygote
or using ReverseDiff
next to using Turing
.
For more information on Turing's automatic differentiation backend, please see the Automatic Differentiation article.
Progress Logging
Turing.jl
uses ProgressLogging.jl to log the progress of sampling. Progress
logging is enabled as default but might slow down inference. It can be turned on
or off by setting the keyword argument progress
of sample
to true
or false
, respectively. Moreover, you can enable or disable progress logging globally by calling setprogress!(true)
or setprogress!(false)
, respectively.
Turing uses heuristics to select an appropriate visualization backend. If you use Juno, the progress is displayed with a progress bar in the Atom window. For Jupyter notebooks the default backend is ConsoleProgressMonitor.jl. In all other cases, progress logs are displayed with TerminalLoggers.jl. Alternatively, if you provide a custom visualization backend, Turing uses it instead of the default backend.