Transforming variables

Motivation

In a probabilistic programming language (PPL) such as DynamicPPL.jl, one crucial functionality for enabling a large number of inference algorithms to be implemented, in particular gradient-based ones, is the ability to work with "unconstrained" variables.

For example, consider the following model:

@model function demo()
    s ~ InverseGamma(2, 3)
    return m ~ Normal(0, √s)
end

Here we have two variables s and m, where s is constrained to be positive, while m can be any real number.

For certain inference methods, it's necessary / much more convenient to work with an equivalent model to demo but where all the variables can take any real values (they're "unconstrained").

Note

We write "unconstrained" with quotes because there are many ways to transform a constrained variable to an unconstrained one, and DynamicPPL can work with a much broader class of bijective transformations of variables, not just ones that go to the entire real line. But for MCMC, unconstraining is the most common transformation so we'll stick with that terminology.

For a large family of constraints encountered in practice, it is indeed possible to transform a (partially) constrained model to a completely unconstrained one in such a way that sampling in the unconstrained space is equivalent to sampling in the constrained space.

In DynamicPPL.jl, this is often referred to as linking (a term originating in the statistics literature) and is done using transformations from Bijectors.jl.

For example, the above model could be transformed into (the following pseudo-code; it's not working code):

@model function demo()
    log_s ~ log(InverseGamma(2, 3))
    s = exp(log_s)
    return m ~ Normal(0, √s)
end

Here log_s is an unconstrained variable, and s is a constrained variable that is a deterministic function of log_s.

But to ensure that we stay consistent with what the user expects, DynamicPPL.jl does not actually transform the model as above, but instead makes use of transformed variables internally to achieve the same effect, when desired.

In the end, we'll end up with something that looks like this:

%%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%% %%{ init: { 'flowchart': { 'curve': 'linear' } } }%% graph TD subgraph assume ["assume"] style assume fill:#ffffff,font-family:Courier A["x ~ Normal()"]:::boxStyle B["vn = <span style='color:#3B6EA8 !important;'>@varname</span>(x)<br>dist = Normal()<br>x, vi = ..."]:::boxStyle C["assume(vn, dist, vi)"]:::boxStyle D(["<span style='color:#3B6EA8 !important;'>if</span> istrans(vi, vn)"]):::boxStyle E["f = from_internal_transform(vi, vn, dist)"]:::boxStyle F["f = from_linked_internal_transform(vi, vn, dist)"]:::boxStyle G["x, logjac = with_logabsdet_jacobian(f, getindex_internal(vi, vn, dist))"]:::boxStyle H["<span style='color:#3B6EA8 !important;'>return</span> x, logpdf(dist, x) - logjac, vi"]:::boxStyle A -.->|<span style='color:#3B6EA8 ; background-color:#ffffff;'>@model</span>| B B -.->|<span style='color:#000000 ; background-color:#ffffff;'>tilde-pipeline</span>| C C --> D D -->|<span style='color:#97365B ; background-color:#ffffff;'>false</span>| E D -->|<span style='color:#97365B ; background-color:#ffffff;'>true</span>| F E --> G F --> G G --> H end classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000 linkStyle default stroke:#000000,stroke-width:1px,color:#000000

Below we'll see how this is done.

What do we need?

There are two aspects to transforming from the internal representation of a variable in a varinfo to the representation wanted in the model:

  1. Different implementations of AbstractVarInfo represent realizations of a model in different ways internally, so we need to transform from this internal representation to the desired representation in the model. For example,

    • VarInfo represents a realization of a model as a "flattened" / vector representation, regardless of the form of the variable in the model.
    • SimpleVarInfo represents a realization of a model exactly as in the model (unless it has been transformed; we'll get to that later).
  2. We need the ability to transform from "constrained space" to "unconstrained space", as we saw in the previous section.

Working example

A good and non-trivial example to keep in mind throughout is the following model:

using DynamicPPL, Distributions
@model demo_lkj() = x ~ LKJCholesky(2, 1.0)
demo_lkj (generic function with 2 methods)

LKJCholesky is a LKJ(2, 1.0) distribution, a distribution over correlation matrices (covariance matrices but with unit diagonal), but working directly with the Cholesky factorization of the correlation matrix rather than the correlation matrix itself (this is more numerically stable and computationally efficient).

Note

This is a particularly "annoying" case because the return-value is not a simple Real or AbstractArray{<:Real}, but rather a LineraAlgebra.Cholesky object which wraps a triangular matrix (whether it's upper- or lower-triangular depends on the instance).

As mentioned, some implementations of AbstractVarInfo, e.g. VarInfo, works with a "flattened" / vector representation of a variable, and so in this case we need two transformations:

  1. From the Cholesky object to a vector representation.
  2. From the Cholesky object to an "unconstrained" / linked vector representation.

And similarly, we'll need the inverses of these transformations.

From internal representation to model representation

To go from the internal variable representation of an AbstractVarInfo to the variable representation wanted in the model, e.g. from a Vector{Float64} to Cholesky in the case of VarInfo in demo_lkj, we have the following methods:

DynamicPPL.to_internal_transformFunction
to_internal_transform(varinfo::AbstractVarInfo, vn::VarName[, dist])

Return a transformation that transforms from a representation compatible with dist to the internal representation of vn with dist in varinfo.

If dist is not present, then it is assumed that varinfo knows the correct output for vn.

source
DynamicPPL.from_internal_transformFunction
from_internal_transform(varinfo::AbstractVarInfo, vn::VarName[, dist])

Return a transformation that transforms from the internal representation of vn with dist in varinfo to a representation compatible with dist.

If dist is not present, then it is assumed that varinfo knows the correct output for vn.

source

These methods allow us to extract the internal-to-model transformation function depending on the varinfo, the variable, and the distribution of the variable:

  • varinfo + vn defines the internal representation of the variable.
  • dist defines the representation expected within the model scope.
Note

If vn is not present in varinfo, then the internal representation is fully determined by varinfo alone. This is used when we're about to add a new variable to the varinfo and need to know how to represent it internally.

Continuing from the example above, we can inspect the internal representation of x in demo_lkj with VarInfo using DynamicPPL.getindex_internal:

model = demo_lkj()
varinfo = VarInfo(model)
x_internal = DynamicPPL.getindex_internal(varinfo, @varname(x))
4-element Vector{Float64}:
 1.0
 0.8276171444428266
 0.0
 0.5612930270583284
f_from_internal = DynamicPPL.from_internal_transform(
    varinfo, @varname(x), LKJCholesky(2, 1.0)
)
f_from_internal(x_internal)
LinearAlgebra.Cholesky{Float64, Matrix{Float64}}
L factor:
2×2 LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}:
 1.0        ⋅ 
 0.827617  0.561293

Let's confirm that this is the same as varinfo[@varname(x)]:

x_model = varinfo[@varname(x)]
LinearAlgebra.Cholesky{Float64, Matrix{Float64}}
L factor:
2×2 LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}:
 1.0        ⋅ 
 0.827617  0.561293

Similarly, we can go from the model representation to the internal representation:

f_to_internal = DynamicPPL.to_internal_transform(varinfo, @varname(x), LKJCholesky(2, 1.0))

f_to_internal(x_model)
4-element reshape(::LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}, 4) with eltype Float64:
 1.0
 0.8276171444428266
 0.0
 0.5612930270583284

It's also useful to see how this is done in SimpleVarInfo:

simple_varinfo = SimpleVarInfo(varinfo)
DynamicPPL.getindex_internal(simple_varinfo, @varname(x))
LinearAlgebra.Cholesky{Float64, Matrix{Float64}}
L factor:
2×2 LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}:
 1.0        ⋅ 
 0.827617  0.561293

Here see that the internal representation is exactly the same as the model representation, and so we'd expect from_internal_transform to be the identity function:

DynamicPPL.from_internal_transform(simple_varinfo, @varname(x), LKJCholesky(2, 1.0))
identity (generic function with 1 method)

Great!

From unconstrained internal representation to model representation

In addition to going from internal representation to model representation of a variable, we also need to be able to go from the unconstrained internal representation to the model representation.

For this, we have the following methods:

DynamicPPL.to_linked_internal_transformFunction
to_linked_internal_transform(varinfo::AbstractVarInfo, vn::VarName[, dist])

Return a transformation that transforms from a representation compatible with dist to the linked internal representation of vn with dist in varinfo.

If dist is not present, then it is assumed that varinfo knows the correct output for vn.

source
DynamicPPL.from_linked_internal_transformFunction
from_linked_internal_transform(varinfo::AbstractVarInfo, vn::VarName[, dist])

Return a transformation that transforms from the linked internal representation of vn with dist in varinfo to a representation compatible with dist.

If dist is not present, then it is assumed that varinfo knows the correct output for vn.

source

These are very similar to DynamicPPL.to_internal_transform and DynamicPPL.from_internal_transform, but here the internal representation is also linked / "unconstrained".

Continuing from the example above:

f_to_linked_internal = DynamicPPL.to_linked_internal_transform(
    varinfo, @varname(x), LKJCholesky(2, 1.0)
)

x_linked_internal = f_to_linked_internal(x_model)
1-element Vector{Float64}:
 1.1805251914676889
f_from_linked_internal = DynamicPPL.from_linked_internal_transform(
    varinfo, @varname(x), LKJCholesky(2, 1.0)
)

f_from_linked_internal(x_linked_internal)
LinearAlgebra.Cholesky{Float64, Matrix{Float64}}
L factor:
2×2 LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}:
 1.0        ⋅ 
 0.827617  0.561293

Here we see a significant difference between the linked representation and the non-linked representation: the linked representation is only of length 1, whereas the non-linked representation is of length 4. This is because we actually only need a single element to represent a 2x2 correlation matrix, as the diagonal elements are always 1 and it's symmetric.

We can also inspect the transforms themselves:

f_from_internal
DynamicPPL.ToChol('L') ∘ DynamicPPL.ReshapeTransform{Tuple{Int64}, Tuple{Int64, Int64}}((4,), (2, 2))

vs.

f_from_linked_internal
Bijectors.Inverse{Bijectors.VecCholeskyBijector}(Bijectors.VecCholeskyBijector(:L))

Here we see that f_from_linked_internal is a single function taking us directly from the linked representation to the model representation, whereas f_from_internal is a composition of a few functions: one reshaping the underlying length 4 array into 2x2 matrix, and the other converting this matrix into a Cholesky, as required to be compatible with LKJCholesky(2, 1.0).

Why do we need both to_internal_transform and to_linked_internal_transform?

One might wonder why we need both to_internal_transform and to_linked_internal_transform instead of just a single to_internal_transform which returns the "standard" internal representation if the variable is not linked / "unconstrained" and the linked / "unconstrained" internal representation if it is.

That is, why can't we just do

%%{ init: { 'flowchart': { 'curve': 'linear' } } }%% %%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%% graph TD subgraph assume ["assume"] style assume fill:#ffffff,font-family:Courier A["assume(varinfo, <span style='color:#3B6EA8 !important;'>@varname</span>(x), Normal())"]:::boxStyle B["f = from_internal_transform(varinfo, varname, dist)"]:::boxStyle C["x, logjac = with_logabsdet_jacobian(f, assume_internal(varinfo, varname, dist))"]:::boxStyle D["<span style='color:#3B6EA8 !important;'>return</span> x, logpdf(dist, x) - logjac, varinfo"]:::dashedBox A --> B B --> C C --> D end classDef dashedBox fill:#ffffff,stroke:#000000,stroke-dasharray: 5 5,font-family:Courier,color:#000000 classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000 linkStyle default stroke:#000000,stroke-width:1px,color:#000000

Unfortunately, this is not possible in general. Consider for example the following model:

@model function demo_dynamic_constraint()
    m ~ Normal()
    x ~ truncated(Normal(); lower=m)

    return (m=m, x=x)
end
demo_dynamic_constraint (generic function with 2 methods)

Here the variable x is constrained to be in the domain (m, Inf), where m is sampled according to a Normal.

model = demo_dynamic_constraint()
varinfo = VarInfo(model)
varinfo[@varname(m)], varinfo[@varname(x)]
(1.2667632645918812, 1.4952479034952928)

We see that the realization of x is indeed greater than m, as expected.

But what if we link this varinfo so that we end up working on an "unconstrained" space, i.e. both m and x can take on any values in (-Inf, Inf):

varinfo_linked = link(varinfo, model)
varinfo_linked[@varname(m)], varinfo_linked[@varname(x)]
(1.2667632645918812, 1.4952479034952928)

Still get the same values, as expected, since internally varinfo transforms from the linked internal representation to the model representation.

But what if we change the value of m, to, say, a bit larger than x?

# Update realization for `m` in `varinfo_linked`.
varinfo_linked[@varname(m)] = varinfo_linked[@varname(x)] + 1
varinfo_linked[@varname(m)], varinfo_linked[@varname(x)]
(2.495247903495293, 1.4952479034952928)

Now we see that the constraint m < x is no longer satisfied!

Hence one might expect that if we try to compute, say, the logjoint using varinfo_linked with this "invalid" realization, we'll get an error:

logjoint(model, varinfo_linked)
-5.068409286919092

But we don't! In fact, if we look at the actual value used within the model

first(DynamicPPL.evaluate!!(model, varinfo_linked, DefaultContext()))
(m = 2.495247903495293, x = 2.7237325423987047)

we see that we indeed satisfy the constraint m < x, as desired.

Warning

One shouldn't be setting variables in a linked varinfo willy-nilly directly like this unless one knows that the value will be compatible with the constraints of the model.

The reason for this is that internally in a model evaluation, we construct the transformation from the internal to the model representation based on the current realizations in the model! That is, we take the dist in a x ~ dist expression at model evaluation time and use that to construct the transformation, thus allowing it to change between model evaluations without invalidating the transformation.

But to be able to do this, we need to know whether the variable is linked / "unconstrained" or not, since the transformation is different in the two cases. Hence we need to be able to determine this at model evaluation time. Hence the internals end up looking something like this:

if istrans(varinfo, varname)
    from_linked_internal_transform(varinfo, varname, dist)
else
    from_internal_transform(varinfo, varname, dist)
end

That is, if the variable is linked / "unconstrained", we use the DynamicPPL.from_linked_internal_transform, otherwise we use DynamicPPL.from_internal_transform.

And so the earlier diagram becomes:

%%{ init: { 'flowchart': { 'curve': 'linear' } } }%% %%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%% graph TD subgraph assume ["assume"] style assume fill:#ffffff,font-family:Courier A["assume(varinfo, <span style='color:#3B6EA8 !important;'>@varname</span>(x), Normal())"]:::boxStyle B(["<span style='color:#3B6EA8 !important;'>if</span> istrans(varinfo, varname)"]):::boxStyle C["f = from_internal_transform(varinfo, varname, dist)"]:::boxStyle D["f = from_linked_internal_transform(varinfo, varname, dist)"]:::boxStyle E["x, logjac = with_logabsdet_jacobian(f, assume_internal(varinfo, varname, dist))"]:::boxStyle F["<span style='color:#3B6EA8 !important;'>return</span> x, logpdf(dist, x) - logjac, varinfo"]:::dashedBox A --> B B -->|<span style='color:#97365B ; background-color:#ffffff;'>false</span>| C B -->|<span style='color:#97365B ; background-color:#ffffff;'>true</span>| D C --> E D --> E E --> F end classDef dashedBox fill:#ffffff,stroke:#000000,stroke-dasharray: 5 5,font-family:Courier,color:#000000 classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000 linkStyle default stroke:#000000,stroke-width:1px,color:#000000
Note

If the support of dist was constant, this would not be necessary since we could just determine the transformation at the time of varinfo_linked = link(varinfo, model) and define this as the from_internal_transform for all subsequent evaluations. However, since the support of dist is not constant in general, we need to be able to determine the transformation at the time of the evaluation and thus whether we should construct the transformation from the linked internal representation or the non-linked internal representation. This is annoying, but necessary.

This is also the reason why we have two definitions of getindex:

For getindex we have the following diagram:

%%{ init: { 'flowchart': { 'curve': 'linear' } } }%% %%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%% graph TD subgraph getindex ["getindex"] style getindex fill:#ffffff,font-family:Courier A["x = getindex(varinfo, <span style='color:#3B6EA8 !important;'>@varname</span>(x), Normal())"]:::boxStyle B(["<span style='color:#3B6EA8 !important;'>if</span> istrans(varinfo, varname)"]):::boxStyle C["f = from_internal_transform(varinfo, varname, dist)"]:::boxStyle D["f = from_linked_internal_transform(varinfo, varname, dist)"]:::boxStyle E["<span style='color:#3B6EA8 !important;'>return</span> f(getindex_internal(varinfo, varname))"]:::dashedBox A --> B B -->|<span style='color:#97365B ; background-color:#ffffff;'>false</span>| C B -->|<span style='color:#97365B ; background-color:#ffffff;'>true</span>| D C --> E D --> E end classDef dashedBox fill:#ffffff,stroke:#000000,stroke-dasharray: 5 5,font-family:Courier,color:#000000 classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000 linkStyle default stroke:#000000,stroke-width:1px,color:#000000

While if dist is not provided, we have:

%%{ init: { 'flowchart': { 'curve': 'linear' } } }%% %%{ init: { 'themeVariables': { 'lineColor': '#000000' } } }%% graph TD subgraph getindex ["getindex"] style getindex fill:#ffffff,font-family:Courier A["x = getindex(varinfo, <span style='color:#3B6EA8 !important;'>@varname</span>(x))"]:::boxStyle B(["<span style='color:#3B6EA8 !important;'>if</span> istrans(varinfo, varname)"]):::boxStyle C["f = from_internal_transform(varinfo, varname)"]:::boxStyle D["f = from_linked_internal_transform(varinfo, varname)"]:::boxStyle E["<span style='color:#3B6EA8 !important;'>return</span> f(getindex_internal(varinfo, varname))"]:::dashedBox A --> B B -->|<span style='color:#97365B ; background-color:#ffffff;'>false</span>| C B -->|<span style='color:#97365B ; background-color:#ffffff;'>true</span>| D C --> E D --> E end classDef dashedBox fill:#ffffff,stroke:#000000,stroke-dasharray: 5 5,font-family:Courier,color:#000000 classDef boxStyle fill:#ffffff,stroke:#000000,font-family:Courier,color:#000000 linkStyle default stroke:#000000,stroke-width:1px,color:#000000

Notice that dist is not present here, but otherwise the diagrams are the same.

Warning

This does mean that the getindex(varinfo, varname) might not be the same as the getindex(varinfo, varname, dist) that occurs within a model evaluation! This can be confusing, but as outlined above, we do want to allow the dist in a x ~ dist expression to "override" whatever transformation varinfo might have.

Other functionalities

There are also some additional methods for transforming between representations that are all automatically implemented from DynamicPPL.from_internal_transform, DynamicPPL.from_linked_internal_transform and their siblings, and thus don't need to be implemented manually.

Convenience methods for constructing transformations:

DynamicPPL.from_maybe_linked_internal_transformFunction
from_maybe_linked_internal_transform(varinfo::AbstractVarInfo, vn::VarName[, dist])

Return a transformation that transforms from the possibly linked internal representation of vn with distn in varinfo to a representation compatible with dist.

If dist is not present, then it is assumed that varinfo knows the correct output for vn.

source
DynamicPPL.to_maybe_linked_internal_transformFunction
to_maybe_linked_internal_transform(varinfo::AbstractVarInfo, vn::VarName[, dist])

Return a transformation that transforms from a representation compatible with dist to a possibly linked internal representation of vn with dist in varinfo.

If dist is not present, then it is assumed that varinfo knows the correct output for vn.

source
DynamicPPL.internal_to_linked_internal_transformFunction
internal_to_linked_internal_transform(varinfo::AbstractVarInfo, vn::VarName, dist)

Return a transformation that transforms from the internal representation of vn with dist in varinfo to a linked internal representation of vn with dist in varinfo.

If dist is not present, then it is assumed that varinfo knows the correct output for vn.

source
DynamicPPL.linked_internal_to_internal_transformFunction
linked_internal_to_internal_transform(varinfo::AbstractVarInfo, vn::VarName[, dist])

Return a transformation that transforms from a linked internal representation of vn with dist in varinfo to the internal representation of vn with dist in varinfo.

If dist is not present, then it is assumed that varinfo knows the correct output for vn.

source

Convenience methods for transforming between representations without having to explicitly construct the transformation:

Supporting a new distribution

To support a new distribution, one needs to implement for the desired AbstractVarInfo the following methods:

At the time of writing, VarInfo is the one that is most commonly used, whose internal representation is always a Vector. In this scenario, one can just implement the following methods instead:

DynamicPPL.from_vec_transformMethod
from_vec_transform(dist::Distribution)

Return the transformation from the vector representation of a realization from distribution dist to the original representation compatible with dist.

source

These are used internally by VarInfo.

Optionally, if inverse of the above is expensive to compute, one can also implement:

And similarly, there are corresponding to-methods for the from_*_vec_transform variants too

Warning

Whatever the resulting transformation is, it should be invertible, i.e. implement InverseFunctions.inverse, and have a well-defined log-abs-det Jacobian, i.e. implement ChangesOfVariables.with_logabsdet_jacobian.

TL;DR

  • DynamicPPL.jl has three representations of a variable: the model representation, the internal representation, and the linked internal representation.

    • The model representation is the representation of the variable as it appears in the model code / is expected by the dist on the right-hand-side of the ~ in the model code.
    • The internal representation is the representation of the variable as it appears in the varinfo, which varies between implementations of AbstractVarInfo, e.g. a Vector in VarInfo. This can be converted to the model representation by DynamicPPL.from_internal_transform.
    • The linked internal representation is the representation of the variable as it appears in the varinfo after linking. This can be converted to the model representation by DynamicPPL.from_linked_internal_transform.
  • Having separation between internal and linked internal is necessary because transformations might be constructed at the time of model evaluation, and thus we need to know whether to construct the transformation from the internal representation or the linked internal representation.