Core Functionality

This article provides an overview of the core functionality in Turing.jl, which are likely to be used across a wide range of models.

Basics

Introduction

A probabilistic program is a Julia function wrapped in a @model macro. In this function, arbitrary Julia code can be used, but to ensure correctness of inference it should not have external effects or modify global state.

To specify distributions of random variables, Turing models use ~ notation: x ~ distr where x is an identifier. This resembles the notation used in statistical models. For example, the model:

\[\begin{align} a &\sim \text{Normal}(0, 1) \\ x &\sim \text{Normal}(a, 1) \end{align}\]

is written in Turing as:

using Turing

@model function mymodel()
    a ~ Normal(0, 1)
    x ~ Normal(a, 1)
end

Tilde-statements

Indexing and field access is supported, so that x[i] ~ distr and x.field ~ distr are valid statements. However, in these cases, x must be defined in the scope of the model function. distr is typically either a distribution from Distributions.jl (see this page for implementing custom distributions), or another Turing model wrapped in to_submodel() (see this page for submodels).

There are two classes of tilde-statements: observe statements, where the left-hand side contains an observed value, and assume statements, where the left-hand side is not observed. These respectively correspond to likelihood and prior terms.

It is easier to start by explaining when a variable is treated as an observed value. This can happen in one of two ways: - The variable is passed as one of the arguments to the model function; or - The value of the variable in the model is explicitly conditioned or fixed.

Caution

Note that it is not enough for the variable to be defined in the current scope. For example, in

@model function mymodel(x)
    y = x + 1
    y ~ Normal(0, 1)
end

y is not treated as an observed value.

In such a case, x is considered to be an observed value, assumed to have been drawn from the distribution distr. The likelihood (if needed) is computed using loglikelihood(distr, x).

On the other hand, if neither of the above are true, then this is treated as an assume-statement: inside the probabilistic program, this samples a new variable called x, distributed according to distr, and places it in the current scope.

Simple Gaussian Demo

Below is a simple Gaussian demo illustrating the basic usage of Turing.jl.

# Import packages.
using Turing
using StatsPlots

# Define a simple Normal model with unknown mean and variance.
@model function gdemo(x, y)
    s² ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s²))
    x ~ Normal(m, sqrt(s²))
    return y ~ Normal(m, sqrt(s²))
end

gdemo (generic function with 2 methods)

In Turing.jl, MCMC sampling is performed using the sample() function, which (at its most basic) takes a model, a sampler, and the number of samples to draw.

For this model, the prior expectation of s² is mean(InverseGamma(2, 3)) = 3/(2 - 1) = 3, and the prior expectation of m is 0. We can check this using the Prior sampler:

setprogress!(false)

p1 = sample(gdemo(missing, missing), Prior(), 100000)

Chains MCMC chain (100000×5×1 Array{Float64, 3}):

Iterations        = 1:1:100000
Number of chains  = 1
Samples per chain = 100000
Wall duration     = 1.32 seconds
Compute duration  = 1.32 seconds
parameters        = s², m, x, y
internals         = lp

Use `describe(chains)` for summary statistics and quantiles.

To perform inference, we simply need to specify the sampling algorithm we want to use.

#  Run sampler, collect results.
c1 = sample(gdemo(1.5, 2), SMC(), 1000)
c2 = sample(gdemo(1.5, 2), PG(10), 1000)
c3 = sample(gdemo(1.5, 2), HMC(0.1, 5), 1000)
c4 = sample(gdemo(1.5, 2), Gibbs(:m => PG(10), :s² => HMC(0.1, 5)), 1000)
c5 = sample(gdemo(1.5, 2), HMCDA(0.15, 0.65), 1000)
c6 = sample(gdemo(1.5, 2), NUTS(0.65), 1000)

┌ Info: Found initial step size
└   ϵ = 2.0
┌ Info: Found initial step size
└   ϵ = 0.8500000000000001

Chains MCMC chain (1000×14×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 1.73 seconds
Compute duration  = 1.73 seconds
parameters        = s², m
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Use `describe(chains)` for summary statistics and quantiles.

The arguments for each sampler are:

SMC: number of particles.
PG: number of particles, number of iterations.
HMC: leapfrog step size, leapfrog step numbers.
Gibbs: component sampler 1, component sampler 2, …
HMCDA: total leapfrog length, target accept ratio.
NUTS: number of adaptation steps (optional), target accept ratio.

More information about each sampler can be found in Turing.jl’s API docs.

The MCMCChains module (which is re-exported by Turing) provides plotting tools for the Chain objects returned by a sample function. See the MCMCChains repository for more information on the suite of tools available for diagnosing MCMC chains.

# Summarise results
describe(c3)

# Plot results
plot(c3)
savefig("gdemo-plot.png")

Conditioning on data

Using this syntax, a probabilistic model is defined in Turing. The model function generated by Turing can then be used to condition the model on data. Subsequently, the sample function can be used to generate samples from the posterior distribution.

In the following example, the defined model is conditioned to the data (arg_1 = 1, arg_2 = 2) by passing the arguments 1 and 2 to the model function.

@model function model_name(arg_1, arg_2)
    arg_1 ~ ...
    arg_2 ~ ...
end

The conditioned model can then be passed onto the sample function to run posterior inference.

model = model_name(1, 2)
chn = sample(model, HMC(0.5, 20), 1000) # Sample with HMC.

Alternatively, one can also use the conditioning operator | to condition the model on data. In this case, the model does not need to be defined with arg_1 and arg_2 as parameters.

@model function model_name()
    arg_1 ~ ...
    arg_2 ~ ...
end

# Condition the model on data.
model = model_name() | (arg_1 = 1, arg_2 = 2)

Analysing MCMC chains

The returned chain contains samples of the variables in the model.

var_1 = mean(chn[:var_1]) # Taking the mean of a variable named var_1.

The key (:var_1) can either be a Symbol or a String. For example, to fetch x[1], one can use chn[Symbol("x[1]")] or chn["x[1]"]. If you want to retrieve all parameters associated with a specific symbol, you can use group. As an example, if you have the parameters "x[1]", "x[2]", and "x[3]", calling group(chn, :x) or group(chn, "x") will return a new chain with only "x[1]", "x[2]", and "x[3]".

Tilde-statement ordering

Turing does not have a declarative form. Thus, the ordering of tilde-statements in a Turing model is important: random variables cannot be used until they have been first declared in a tilde-statement. For example, the following example works:

# Define a simple Normal model with unknown mean and variance.
@model function model_function(y)
    s ~ Poisson(1)
    y ~ Normal(s, 1)
    return y
end

sample(model_function(10), SMC(), 100)

But if we switch the s ~ Poisson(1) and y ~ Normal(s, 1) lines, the model will no longer sample correctly:

# Define a simple Normal model with unknown mean and variance.
@model function model_function(y)
    y ~ Normal(s, 1)
    s ~ Poisson(1)
    return y
end

sample(model_function(10), SMC(), 100)

Sampling Multiple Chains

Turing supports distributed and threaded parallel sampling. To do so, call sample(model, sampler, parallel_type, n, n_chains), where parallel_type can be either MCMCThreads() or MCMCDistributed() for thread and parallel sampling, respectively.

Having multiple chains in the same object is valuable for evaluating convergence. Some diagnostic functions like gelmandiag require multiple chains.

If you want to sample multiple chains without using parallelism, you can use MCMCSerial():

# Sample 3 chains in a serial fashion.
chains = sample(model, sampler, MCMCSerial(), 1000, 3)

The chains variable now contains a Chains object which can be indexed by chain. To pull out the first chain from the chains object, use chains[:,:,1]. The method is the same if you use either of the below parallel sampling methods.

Multithreaded sampling

If you wish to perform multithreaded sampling, you can call sample with the following signature:

# Sample four chains using multiple threads, each with 1000 samples.
sample(model, sampler, MCMCThreads(), 1000, 4)

Be aware that Turing cannot add threads for you – you must have started your Julia instance with multiple threads to experience any kind of parallelism. See the Julia documentation for details on how to achieve this.

Distributed sampling

To perform distributed sampling (using multiple processes), you must first import Distributed.

Process parallel sampling can be done like so:

# Load Distributed to add processes and the @everywhere macro.
using Distributed

# Load Turing.
using Turing

# Add four processes to use for sampling.
addprocs(4; exeflags="--project=$(Base.active_project())")

# Initialize everything on all the processes.
# Note: Make sure to do this after you've already loaded Turing,
#       so each process does not have to precompile.
#       Parallel sampling may fail silently if you do not do this.
@everywhere using Turing

# Define a model on all processes.
@everywhere @model function gdemo(x)
    s² ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s²))

    for i in eachindex(x)
        x[i] ~ Normal(m, sqrt(s²))
    end
end

# Declare the model instance everywhere.
@everywhere model = gdemo([1.5, 2.0])

# Sample four chains using multiple processes, each with 1000 samples.
sample(model, NUTS(), MCMCDistributed(), 1000, 4)

Sampling from an Unconditional Distribution (The Prior)

Turing allows you to sample from a declared model’s prior. If you wish to draw a chain from the prior to inspect your prior distributions, you can run

chain = sample(model, Prior(), n_samples)

You can also run your model (as if it were a function) from the prior distribution, by calling the model without specifying inputs or a sampler. In the below example, we specify a gdemo model which returns two variables, x and y. Here, including the return statement is necessary to retrieve the sampled x and y values.

@model function gdemo(x, y)
    s² ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s²))
    x ~ Normal(m, sqrt(s²))
    y ~ Normal(m, sqrt(s²))
    return x, y
end

gdemo (generic function with 2 methods)

To produce a sample from the prior distribution, we instantiate the model with missing inputs:

# Samples from p(x,y)
g_prior_sample = gdemo(missing, missing)
g_prior_sample()

(4.47410664794463, -2.6604858569544088)

Sampling from a Conditional Distribution (The Posterior)

Treating observations as random variables

Inputs to the model that have a value missing are treated as parameters, aka random variables, to be estimated/sampled. This can be useful if you want to simulate draws for that parameter, or if you are sampling from a conditional distribution. Turing supports the following syntax:

@model function gdemo(x, ::Type{T}=Float64) where {T}
    if x === missing
        # Initialize `x` if missing
        x = Vector{T}(undef, 2)
    end
    s² ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s²))
    for i in eachindex(x)
        x[i] ~ Normal(m, sqrt(s²))
    end
end

# Construct a model with x = missing
model = gdemo(missing)
c = sample(model, HMC(0.05, 20), 500)

Chains MCMC chain (500×14×1 Array{Float64, 3}):

Iterations        = 1:1:500
Number of chains  = 1
Samples per chain = 500
Wall duration     = 3.6 seconds
Compute duration  = 3.6 seconds
parameters        = s², m, x[1], x[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size

Use `describe(chains)` for summary statistics and quantiles.

Note the need to initialize x when missing since we are iterating over its elements later in the model. The generated values for x can be extracted from the Chains object using c[:x].

Turing also supports mixed missing and non-missing values in x, where the missing ones will be treated as random variables to be sampled while the others get treated as observations. For example:

@model function gdemo(x)
    s² ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s²))
    for i in eachindex(x)
        x[i] ~ Normal(m, sqrt(s²))
    end
end

# x[1] is a parameter, but x[2] is an observation
model = gdemo([missing, 2.4])
c = sample(model, HMC(0.01, 5), 500)

Chains MCMC chain (500×13×1 Array{Float64, 3}):

Iterations        = 1:1:500
Number of chains  = 1
Samples per chain = 500
Wall duration     = 2.31 seconds
Compute duration  = 2.31 seconds
parameters        = s², m, x[1]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size

Use `describe(chains)` for summary statistics and quantiles.

Default Values

Arguments to Turing models can have default values much like how default values work in normal Julia functions. For instance, the following will assign missing to x and treat it as a random variable. If the default value is not missing, x will be assigned that value and will be treated as an observation instead.

using Turing

@model function generative(x=missing, ::Type{T}=Float64) where {T<:Real}
    if x === missing
        # Initialize x when missing
        x = Vector{T}(undef, 10)
    end
    s² ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s²))
    for i in 1:length(x)
        x[i] ~ Normal(m, sqrt(s²))
    end
    return s², m
end

m = generative()
chain = sample(m, HMC(0.01, 5), 1000)

Chains MCMC chain (1000×22×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 2.14 seconds
Compute duration  = 2.14 seconds
parameters        = s², m, x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8], x[9], x[10]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, numerical_error, step_size, nom_step_size

Use `describe(chains)` for summary statistics and quantiles.

Access Values inside Chain

You can access the values inside a chain in several ways:

Turn them into a DataFrame object
Use their raw AxisArray form
Create a three-dimensional Array object

For example, let c be a Chain:

DataFrame(c) converts c to a DataFrame,
c.value retrieves the values inside c as an AxisArray, and
c.value.data retrieves the values inside c as a 3D Array.

Variable Types and Type Parameters

The element type of a vector (or matrix) of random variables should match the eltype of its prior distribution, i.e., <: Integer for discrete distributions and <: AbstractFloat for continuous distributions.

Some automatic differentiation backends (used in conjunction with Hamiltonian samplers such as HMC or NUTS) further require that the vector’s element type needs to either be:

Real to enable auto-differentiation through the model which uses special number types that are sub-types of Real, or
Some type parameter T defined in the model header using the type parameter syntax, e.g. function gdemo(x, ::Type{T} = Float64) where {T}.

Similarly, when using a particle sampler, the Julia variable used should either be:

An Array, or
An instance of some type parameter T defined in the model header using the type parameter syntax, e.g. function gdemo(x, ::Type{T} = Vector{Float64}) where {T}.

Querying Probabilities from Model or Chain

Turing offers three functions: loglikelihood, logprior, and logjoint to query the log-likelihood, log-prior, and log-joint probabilities of a model, respectively.

Let’s look at a simple model called gdemo:

@model function gdemo0()
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))
    return x ~ Normal(m, sqrt(s))
end

gdemo0 (generic function with 2 methods)

If we observe x to be 1.0, we can condition the model on this datum using the condition syntax:

model = gdemo0() | (x=1.0,)

DynamicPPL.Model{typeof(gdemo0), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{@NamedTuple{x::Float64}, DynamicPPL.DefaultContext}}(gdemo0, NamedTuple(), NamedTuple(), ConditionContext((x = 1.0,), DynamicPPL.DefaultContext()))

Now, let’s compute the log-likelihood of the observation given specific values of the model parameters, s and m:

loglikelihood(model, (s=1.0, m=1.0))

-0.9189385332046728

We can easily verify that value in this case:

logpdf(Normal(1.0, 1.0), 1.0)

-0.9189385332046728

We can also compute the log-prior probability of the model for the same values of s and m:

logprior(model, (s=1.0, m=1.0))

-2.221713955868453

logpdf(InverseGamma(2, 3), 1.0) + logpdf(Normal(0, sqrt(1.0)), 1.0)

-2.221713955868453

Finally, we can compute the log-joint probability of the model parameters and data:

logjoint(model, (s=1.0, m=1.0))

-3.1406524890731258

logpdf(Normal(1.0, 1.0), 1.0) +
logpdf(InverseGamma(2, 3), 1.0) +
logpdf(Normal(0, sqrt(1.0)), 1.0)

-3.1406524890731258

Querying with Chains object is easy as well:

chn = sample(model, Prior(), 10)

Chains MCMC chain (10×3×1 Array{Float64, 3}):

Iterations        = 1:1:10
Number of chains  = 1
Samples per chain = 10
Wall duration     = 0.1 seconds
Compute duration  = 0.1 seconds
parameters        = s, m
internals         = lp

Use `describe(chains)` for summary statistics and quantiles.

loglikelihood(model, chn)

10×1 Matrix{Float64}:
 -2.097388345769989
 -3.13388317611571
 -1.164017476169211
 -4.061240891620444
 -2.7756917393944387
 -1.8024841144703487
 -2.0877255676257707
 -1.6266068321382305
 -2.525583347694964
 -1.4190934690173505

Maximum likelihood and maximum a posterior estimates

Turing also has functions for estimating the maximum aposteriori and maximum likelihood parameters of a model. This can be done with

mle_estimate = maximum_likelihood(model)
map_estimate = maximum_a_posteriori(model)

ModeResult with maximized lp of -2.81
[0.8125000000756749, 0.5000000001494654]

For more details see the mode estimation page.

Beyond the Basics

Compositional Sampling Using Gibbs

Turing.jl provides a Gibbs interface to combine different samplers. For example, one can combine an HMC sampler with a PG sampler to run inference for different parameters in a single model as below.

@model function simple_choice(xs)
    p ~ Beta(2, 2)
    z ~ Bernoulli(p)
    for i in 1:length(xs)
        if z == 1
            xs[i] ~ Normal(0, 1)
        else
            xs[i] ~ Normal(2, 1)
        end
    end
end

simple_choice_f = simple_choice([1.5, 2.0, 0.3])

chn = sample(simple_choice_f, Gibbs(:p => HMC(0.2, 3), :z => PG(20)), 1000)

Chains MCMC chain (1000×3×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 17.39 seconds
Compute duration  = 17.39 seconds
parameters        = p, z
internals         = lp

Use `describe(chains)` for summary statistics and quantiles.

The Gibbs sampler can be used to specify unique automatic differentiation backends for different variable spaces. Please see the Automatic Differentiation article for more.

For more details of compositional sampling in Turing.jl, please check the corresponding paper.

Working with filldist and arraydist

Turing provides filldist(dist::Distribution, n::Int) and arraydist(dists::AbstractVector{<:Distribution}) as a simplified interface to construct product distributions, e.g., to model a set of variables that share the same structure but vary by group.

Constructing product distributions with filldist

The function filldist provides a general interface to construct product distributions over distributions of the same type and parameterisation. Note that, in contrast to the product distribution interface provided by Distributions.jl (Product), filldist supports product distributions over univariate or multivariate distributions.

Example usage:

@model function demo(x, g)
    k = length(unique(g))
    a ~ filldist(Exponential(), k) # = Product(fill(Exponential(), k))
    mu = a[g]
    for i in eachindex(x)
        x[i] ~ Normal(mu[i])
    end
    return mu
end

demo (generic function with 2 methods)

Constructing product distributions with `arraydist`

The function arraydist provides a general interface to construct product distributions over distributions of varying type and parameterisation. Note that in contrast to the product distribution interface provided by Distributions.jl (Product), arraydist supports product distributions over univariate or multivariate distributions.

Example usage:

@model function demo(x, g)
    k = length(unique(g))
    a ~ arraydist([Exponential(i) for i in 1:k])
    mu = a[g]
    for i in eachindex(x)
        x[i] ~ Normal(mu[i])
    end
    return mu
end

demo (generic function with 2 methods)

Working with MCMCChains.jl

Turing.jl wraps its samples using MCMCChains.Chain so that all the functions working for MCMCChains.Chain can be re-used in Turing.jl. Two typical functions are MCMCChains.describe and MCMCChains.plot, which can be used as follows for an obtained chain chn. For more information on MCMCChains, please see the GitHub repository.

describe(chn) # Lists statistics of the samples.
plot(chn) # Plots statistics of the samples.

Chains MCMC chain (1000×3×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 17.39 seconds
Compute duration  = 17.39 seconds
parameters        = p, z
internals         = lp

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

           p    0.4151    0.2042    0.0224    79.0195   120.9835    1.0094     ⋯
           z    0.1530    0.3602    0.0147   602.4128        NaN    0.9992     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           p    0.0559    0.2565    0.4169    0.5634    0.8098
           z    0.0000    0.0000    0.0000    0.0000    1.0000

There are numerous functions in addition to describe and plot in the MCMCChains package, such as those used in convergence diagnostics. For more information on the package, please see the GitHub repository.

Changing Default Settings

Some of Turing.jl’s default settings can be changed for better usage.

AD Backend

Turing is thoroughly tested with three automatic differentiation (AD) backend packages. The default AD backend is ForwardDiff, which uses forward-mode AD. Two reverse-mode AD backends are also supported, namely Mooncake and ReverseDiff. Mooncake and ReverseDiff also require the user to explicitly load them using import Mooncake or import ReverseDiff next to using Turing.

For more information on Turing’s automatic differentiation backend, please see the Automatic Differentiation article as well as the ADTests website, where a number of AD backends (not just those above) are tested against Turing.jl.

Progress Logging

Turing.jl uses ProgressLogging.jl to log the sampling progress. Progress logging is enabled as default but might slow down inference. It can be turned on or off by setting the keyword argument progress of sample to true or false. Moreover, you can enable or disable progress logging globally by calling setprogress!(true) or setprogress!(false), respectively.

Turing uses heuristics to select an appropriate visualization backend. If you use Jupyter notebooks, the default backend is ConsoleProgressMonitor.jl. In all other cases, progress logs are displayed with TerminalLoggers.jl. Alternatively, if you provide a custom visualization backend, Turing uses it instead of the default backend.

Basics

Introduction

Tilde-statements

Simple Gaussian Demo

Conditioning on data

Analysing MCMC chains

Tilde-statement ordering

Sampling Multiple Chains

Multithreaded sampling

Distributed sampling

Sampling from an Unconditional Distribution (The Prior)

Sampling from a Conditional Distribution (The Posterior)

Treating observations as random variables

Default Values

Access Values inside Chain

Variable Types and Type Parameters

Querying Probabilities from Model or Chain

Maximum likelihood and maximum a posterior estimates

Beyond the Basics

Compositional Sampling Using Gibbs

Working with filldist and arraydist

Constructing product distributions with filldist

Constructing product distributions with arraydist

Working with MCMCChains.jl

Changing Default Settings

AD Backend

Progress Logging

Constructing product distributions with `arraydist`