Submodels

using Turing
using Random: Xoshiro, seed!
seed!(468)

Random.TaskLocalRNG()

In Turing.jl, you can define models and use them as components of larger models (i.e., submodels), using the to_submodel function. In this way, you can (for example) define a model once and use it in multiple places:

@model function inner()
    a ~ Normal()
    return a + 100
end

@model function outer()
    # This line adds the variable `x.a` to the chain.
    # The inner variable `a` is prefixed with the
    # left-hand side of the `~` operator, i.e. `x`.
    x ~ to_submodel(inner())
    # Here, the value of x will be `a + 100` because
    # that is the return value of the submodel.
    b ~ Normal(x)
end

outer (generic function with 2 methods)

If we sample from this model, we would expect that x.a should be close to zero, and b close to 100:

rand(outer())

(var"x.a" = 0.07200886749732076, b = 99.9979651109378)

Manipulating submodels

Conditioning

In general, everything that can be done to a model ‘carries over’ to when it is used as a submodel. For example, you can condition a variable in a submodel in two ways:

# From the outside; the prefix `x` must be applied because
# from the perspective of `outer`, the variable is called
# `x.a`.
outer_conditioned1 = outer() | (@varname(x.a) => 1);
rand(Xoshiro(468), outer_conditioned1)

(b = 101.07200886749732,)

Or equivalently, from the inside:

@model function outer_conditioned2()
    # The prefix doesn't need to be applied here because
    # `inner` itself has no knowledge of the prefix.
    x ~ to_submodel(inner() | (@varname(a) => 1))
    b ~ Normal(x)
end
rand(Xoshiro(468), outer_conditioned2())

(b = 101.07200886749732,)

In both cases the variable x.a does not appear.

Note, however, that you cannot condition on the return value of a submodel. Thus, for example, if we had:

@model function inner_sensible()
    a ~ Normal()
    return a
end

@model function outer()
    x ~ to_submodel(inner())
    b ~ Normal(x)
end

outer (generic function with 2 methods)

and we tried to condition on x, it would be silently ignored, even though x is equal to a.

The reason for this is because it is entirely coincidental that the return value of the submodel is equal to a. In general, a return value can be anything, and conditioning on it is in general not a meaningful operation.

Prefixing

Prefixing is the only place where submodel behaviour is ‘special’ compared to that of ordinary models.

By default, all variables in a submodel are prefixed with the left-hand side of the tilde-statement. This is done to avoid clashes if the same submodel is used multiple times in a model.

You can disable this by passing false as the second argument to to_submodel:

@model function outer_no_prefix()
    x ~ to_submodel(inner(), false)
    b ~ Normal(x)
end
rand(outer_no_prefix())

(a = 0.6327762377562545, b = 99.65279863588333)

Accessing submodel variables

In all of the examples above, x is equal to a + 100 because that is the return value of the submodel. To access the actual latent variables in the submodel itself, the simplest option is to include the variable in the return value of the submodel:

@model function inner_with_retval()
    a ~ Normal()
    # You can return anything you like from the model,
    # but if you want to access the latent variables, they
    # should be included in the return value.
    return (; a=a, a_plus_100=a + 100)
end
@model function outer_with_retval()
    # The variable `x` will now contain the return value of the submodel,
    # which is a named tuple with `a` and `a_plus_100`.
    x ~ to_submodel(inner_with_retval())
    # You can access the value of x.a directly, because
    # x is a NamedTuple which contains `a`. Since `b` is
    # centred on `x.a`, it should be close to 0, not 100.
    b ~ Normal(x.a)
end
rand(Xoshiro(468), outer_with_retval())

(var"x.a" = 0.07200886749732076, b = -0.0020348890621966487)

You can also manually access the value by looking inside the special __varinfo__ object.

Warning

This relies on DynamicPPL internals and we do not recommend doing this unless you have no other option, e.g., if the submodel is defined in a different package which you do not control.

@model function outer_with_varinfo()
    x ~ to_submodel(inner())
    # Access the value of x.a
    a_value = __varinfo__[@varname(x.a)]
    b ~ Normal(a_value)
end
rand(Xoshiro(468), outer_with_varinfo())

(var"x.a" = 0.07200886749732076, b = -0.0020348890621966487)

Example: linear models

Here is a motivating example for the use of submodels. Suppose we want to fit a (very simplified) regression model to some data \(x\) and \(y\), where

\[\begin{align} c_0 &\sim \text{Normal}(0, 5) \\ c_1 &\sim \text{Normal}(0, 5) \\ \mu &= c_0 + c_1x \\ y &\sim d \end{align}\]

where \(d\) is some distribution parameterised by the value of \(\mu\), which we don’t know the exact form of.

In practice, what we would do is to write several different models, one for each function \(f\):

@model function normal(x, y)
    c0 ~ Normal(0, 5)
    c1 ~ Normal(0, 5)
    mu = c0 .+ c1 .* x
    # Assume that y = mu, and that the noise in `y` is
    # normally distributed with standard deviation sigma
    sigma ~ truncated(Cauchy(0, 3); lower=0)
    for i in eachindex(y)
        y[i] ~ Normal(mu[i], sigma)
    end
end

@model function logpoisson(x, y)
    c0 ~ Normal(0, 5)
    c1 ~ Normal(0, 5)
    mu = c0 .+ c1 .* x
    # exponentiate mu because the rate parameter of
    # a Poisson distribution must be positive
    for i in eachindex(y)
        y[i] ~ Poisson(exp(mu[i]))
    end
end

# and so on...

logpoisson (generic function with 2 methods)

Note

You could use arraydist to avoid the loops: for example, in logpoisson, one could write y ~ arraydist(Poisson.(exp.(mu))), but for simplicity in this tutorial we spell it out fully.

We would then fit all of our models and use some criterion to test which model is most suitable (see e.g. Wikipedia, or section 3.4 of Bishop’s Pattern Recognition and Machine Learning).

However, the code above is quite repetitive. For example, if we wanted to adjust the priors on c0 and c1, we would have to do it in each model separately. If this was any other kind of code, we would naturally think of extracting the common parts into a separate function. In this case we can do exactly that with a submodel:

@model function priors(x)
    c0 ~ Normal(0, 5)
    c1 ~ Normal(0, 5)
    mu = c0 .+ c1 .* x
    return (; c0=c0, c1=c1, mu=mu)
end

@model function normal(x, y)
    ps = to_submodel(priors(x))
    sigma ~ truncated(Cauchy(0, 3); lower=0)
    for i in eachindex(y)
        y[i] ~ Normal(ps.mu[i], sigma)
    end
end

@model function logpoisson(x, y)
    ps = to_submodel(priors(x))
    for i in eachindex(y)
        y[i] ~ Poisson(exp(ps.mu[i]))
    end
end

logpoisson (generic function with 2 methods)

One could go even further and extract the y section into its own submodel as well, which would bring us to a generalised linear modelling interface that does not actually require the user to define their own Turing models at all:

@model function normal_family(mu, y)
    sigma ~ truncated(Cauchy(0, 3); lower=0)
    for i in eachindex(y)
        y[i] ~ Normal(mu[i], sigma)
    end
    return nothing
end

@model function logpoisson_family(mu, y)
    for i in eachindex(y)
        y[i] ~ Poisson(exp(mu[i]))
    end
    return nothing
end

# An end-user could just use this function. Of course,
# a more thorough interface would also allow the user to
# specify priors, etc.
function make_model(x, y, family::Symbol)
    if family == :normal
        family_model = normal_family
    elseif family == :logpoisson
        family_model = logpoisson_family
    else
        error("unknown family: `$family`")
    end

    @model function general(x, y)
        ps ~ to_submodel(priors(x), false)
        _n ~ to_submodel(family_model(ps.mu, y), false)
    end
    return general(x, y)
end

sample(make_model([1, 2, 3], [1, 2, 3], :normal), NUTS(), 1000; progress=false)

┌ Info: Found initial step size
└   ϵ = 0.05

Chains MCMC chain (1000×15×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 8.33 seconds
Compute duration  = 8.33 seconds
parameters        = c0, c1, sigma
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          c0    0.0592    1.8737    0.1392   150.5689   185.2424    1.0121     ⋯
          c1    0.9260    0.8712    0.0912   104.1980   151.5053    1.0086     ⋯
       sigma    1.5504    2.1716    0.2147    27.3720    34.5818    1.0505     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

          c0   -3.8473   -0.8876    0.0502    0.7641    4.7513
          c1   -1.2424    0.5818    0.9589    1.3678    2.6758
       sigma    0.3370    0.5640    0.9615    1.7027    5.5471

While this final example really showcases the composability of submodels, it also illustrates a minor syntactic drawback. In the case of the family_model, we do not care about its return value because it is not used anywhere else in the model. Ideally, we should therefore not need to place anything on the left-hand side of to_submodel. However, because the special behaviour of to_submodel relies on the tilde operator, and the tilde operator requires a left-hand side, we have to use a dummy variable (here _n).

Furthermore, because the left-hand side of a tilde-statement must be a valid variable name, we cannot use destructuring syntax on the left-hand side of to_submodel, even if the return value is a NamedTuple. Thus, for example, the following is not allowed:

(; c0, c1, mu) ~ to_submodel(priors(x))

To use destructuring syntax, you would have to add a separate line:

ps = to_submodel(priors(x))
(; c0, c1, mu) = ps

Submodels versus distributions

Finally, we end with a discussion of why some of the behaviour for submodels above has come about. This is slightly more behind-the-scenes and therefore will likely be of most interest to Turing developers.

Fundamentally, submodels are to be compared against distributions: both of them can appear on the right-hand side of a tilde statement. However, distributions only have one ‘output’, i.e., the value that is sampled from them:

dist = Normal()
rand(dist)

1.4000191307576382

Another point to bear in mind is that, given a sample from dist, asking for its log-probability is a meaningful calculation.

logpdf(dist, rand(dist))

-3.1959192599647603

In contrast, models (and hence submodels) have two different outputs: the latent variables, and the return value. These are accessed respectively using rand(model) and model():

@model function f()
    a ~ Normal()
    return "hello, world."
end

model = f()

DynamicPPL.Model{typeof(f), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}(f, NamedTuple(), NamedTuple(), DynamicPPL.DefaultContext())

# Latent variables
rand(model)

(a = -0.8394458622445402,)

# Return value
model()

"hello, world."

Just like for distributions, one can indeed ask for the log-probability of the latent variables (although we have to specify whether we want the joint, likelihood, or prior):

logjoint(model, rand(model))

-2.1692571085470234

But it does not make sense to ask for the log-probability of the return value (which in this case is a string, and in general, could be literally any object).

The fact that we have what looks like a unified notation for these is a bit of a lie, since it hides this distinction. In particular, for x ~ distr, x is assigned the value of rand(distr); but for y ~ submodel, y is assigned the value of submodel(). This is why, for example, it is impossible to condition on y in y ~ ...; we can only condition on x in x ~ dist.

Eventually we would like to make this more logically consistent. In particular, it is clear that y ~ submodel should return not one but two objects: the latent variables and the return value. Furthermore, it should be possible to condition on the latent variables, but not on the return value. See this issue for an ongoing discussion of the best way to accomplish this.

It should be mentioned that extracting the latent variables from a submodel is not entirely trivial since the submodel is run using the same VarInfo as the parent model (i.e., we would have to do a before-and-after comparison to see which new variables were added by the submodel).

Also, we are still working out the exact data structure that should be used to represent the latent variables. In the examples above rand(model) returns a NamedTuple, but this actually causes loss of information because the keys of a NamedTuple are Symbols, whereas we really want to use VarNames. See this issue for a current proposal.