The VectorBijectors module
The Bijectors.VectorBijectors module contains functionality that is very similar to that in the core Bijectors modules, but is specifically focused on converting random samples from distributions to and from vectors.
It assumes that there are three forms of samples from a distribution d that we are interested in:
The original form, which is what
rand(d)returns.A vectorised form, which is a vector that contains a flattened version of the original form.
A linked vectorised form, which is a vector in which:
- each element is independent; and
- each element is unconstrained (can take any value in ℝ).
Note that because of the independence requirement, the linked vectorised form may have a different dimension to the vectorised form. For example, when sampling from a Dirichlet distribution, the original form is a vector that always sums to 1. The linked vectorised form will have one element less than the original form, because this constraint is eliminated.
The Bijectors.VectorBijectors module provides functionality to convert between these three forms, via the following functions. Assuming that x = rand(d) for some distribution d:
to_vec(d)is a function which convertsxto the vectorised formfrom_vec(d)is the inverse ofto_vec(d)vec_length(d)returns the length ofto_vec(d)(x)optic_vec(d)returns a vector of optics that describes how each element ofto_vec(d)(x)is accessed fromxto_linked_vec(d)is a function which convertsxto the linked vectorised formfrom_linked_vec(d)is the inverse ofto_linked_vec(d)linked_vec_length(d)returns the length ofto_linked_vec(d)(x)linked_optic_vec(d)returns a vector of optics that describes how each element ofto_linked_vec(d)(x)is accessed fromx(if possible)
For example:
julia> using Bijectors.VectorBijectors, Distributions
julia> d = Beta(2, 2);
x = rand(d); # x is between 0 and 1
0.5602086057097567
julia> to_vec(d)(x)
1-element Vector{Float64}:
0.5602086057097567
julia> to_linked_vec(d)(x)
1-element Vector{Float64}:
0.24200871395677753The bijectors here will also implement ChangesOfVariables.with_logabsdet_jacobian as well as InverseFunctions.inverse. See the main Bijectors documentation for more details of this interface.
Why does this module exist?
This module is intended primarily for use with probabilistic programming, e.g. DynamicPPL.jl, where vectorised samples are required to satisfy the LogDensityProblems.jl interface.
The core Bijectors.jl interface does indeed contain very similar functionality, but it does not guarantee that Bijectors.bijector(d)(x) will always return a vector (in general it can be a scalar or an array of any dimension). Thus, there is often extra overhead introduced when converting to and from vectorised forms. It also makes it difficult to correctly handle edge cases, especially when dealing with recursive calls to bijector for nested distributions (such as product_distribution). See e.g. https://github.com/TuringLang/DynamicPPL.jl/issues/1142.
Implementing your own vector bijector
The full VectorBijectors interface consists of the following functions:
Bijectors.VectorBijectors.from_vec — Function
VectorBijectors.from_vec(d::Distribution)Returns a function that can be used to convert a vectorised sample from d back to its original form.
from_vec(d) is the inverse of to_vec(d).
Examples
julia> using Bijectors.VectorBijectors: from_vec; using Distributions
julia> d = Beta(2, 2); from_vec(d)([0.5])
0.5
julia> d = product_distribution((a = Normal(), b = Beta(2, 2))); from_vec(d)([0.2, 0.5])
(a = 0.2, b = 0.5)Bijectors.VectorBijectors.to_vec — Function
VectorBijectors.to_vec(d::Distribution)Returns a function that can be used to vectorise a sample from d.
to_vec(d) is the inverse of from_vec(d).
Examples
julia> using Bijectors.VectorBijectors: to_vec; using Distributions
julia> d = Beta(2, 2); to_vec(d)(0.5)
1-element Vector{Float64}:
0.5
julia> d = product_distribution((a = Normal(), b = Beta(2, 2))); to_vec(d)((a = 0.2, b = 0.5))
2-element Vector{Float64}:
0.2
0.5Bijectors.VectorBijectors.from_linked_vec — Function
VectorBijectors.from_linked_vec(d::Distribution)Returns a function that can be used to convert an unconstrained vector back to a sample from d.
from_linked_vec(d) is the inverse of to_linked_vec(d).
Examples
julia> using Bijectors.VectorBijectors: from_linked_vec; using Distributions, Bijectors
julia> d = Beta(2, 2); from_linked_vec(d)([1.0])
0.7310585786300049
julia> f = from_linked_vec(d); Bijectors.with_logabsdet_jacobian(f, [1.0])
(0.7310585786300049, -1.6265233750364456)
julia> d = product_distribution((a = Normal(), b = Beta(2, 2))); from_linked_vec(d)([0.2, 1.0])
(a = 0.2, b = 0.7310585786300049)
julia> f = from_linked_vec(d); Bijectors.with_logabsdet_jacobian(f, [0.2, 1.0])
((a = 0.2, b = 0.7310585786300049), -1.6265233750364456)Bijectors.VectorBijectors.to_linked_vec — Function
VectorBijectors.to_linked_vec(d::Distribution)Returns a function that can be used to convert a sample from d to an unconstrained vector.
to_linked_vec(d) is the inverse of from_linked_vec(d).
Examples
julia> using Bijectors.VectorBijectors: to_linked_vec; using Distributions, Bijectors
julia> d = Beta(2, 2); to_linked_vec(d)(0.5)
1-element Vector{Float64}:
0.0
julia> f = to_linked_vec(d); Bijectors.with_logabsdet_jacobian(f, 0.5)
([0.0], 1.3862943611198906)
julia> d = product_distribution((a = Normal(), b = Beta(2, 2))); to_linked_vec(d)((a = 0.2, b = 0.5))
2-element Vector{Float64}:
0.2
0.0
julia> f = to_linked_vec(d); Bijectors.with_logabsdet_jacobian(f, (a = 0.2, b = 0.5))
([0.2, 0.0], 1.3862943611198906)Bijectors.VectorBijectors.vec_length — Function
VectorBijectors.vec_length(d::Distribution)Returns the length of the vector representation of a sample from d, i.e., length(to_vec(d)(rand(d))). However, it does this without actually drawing a sample.
Examples
julia> using Bijectors.VectorBijectors: vec_length; using Distributions
julia> d = Beta(2, 2); vec_length(d)
1
julia> d = product_distribution((a = Normal(), b = Beta(2, 2))); vec_length(d)
2Bijectors.VectorBijectors.linked_vec_length — Function
VectorBijectors.linked_vec_length(d::Distribution)Returns the length of the unconstrained vector representation of a sample from d, i.e., length(to_linked_vec(d)(rand(d))). However, it does this without actually drawing a sample.
Examples
julia> using Bijectors.VectorBijectors: linked_vec_length; using Distributions
julia> d = Beta(2, 2); linked_vec_length(d)
1
julia> d = product_distribution((a = Normal(), b = Beta(2, 2))); linked_vec_length(d)
2Bijectors.VectorBijectors.optic_vec — Function
VectorBijectors.optic_vec(d::Distribution)Returns a vector of optics (from AbstractPPL.jl), which describe how each element in the vectorised sample from d can be accessed from the original sample.
For example, if d = MvNormal(zeros(3), I), then optic_vec(d) would return a vector of [@opticof(_[1]), @opticof(_[2]), @opticof(_[3])]. The length of this vector would be equal to vec_length(d).
If optics = optic_vec(d), then for any sample x ~ d and its vectorised form v = to_vec(d)(x), it should hold that v[i] == optics[i](x) for all i.
Examples
For the Beta case, the return value of an empty optic indicates that 'the first element of the vector is the sample itself'.
For the product_distribution case, the return value of Optic(.a) indicates that 'the first element of the vector is the .a component of the sample', and similarly for Optic(.b).
julia> using Bijectors.VectorBijectors: optic_vec; using Distributions
julia> d = Beta(2, 2); optic_vec(d)
1-element Vector{AbstractPPL.Iden}:
Optic()
julia> d = product_distribution((a = Normal(), b = Beta(2, 2))); optic_vec(d)
2-element Vector{AbstractPPL.Property{_A, AbstractPPL.Iden} where _A}:
Optic(.a)
Optic(.b)Bijectors.VectorBijectors.linked_optic_vec — Function
VectorBijectors.linked_optic_vec(d::Distribution)Returns a vector of optics (from AbstractPPL.jl), which describe how each element in the unconstrained vector representation of a sample from d is related to the original sample.
This is not always well-defined. For example, consider a Dirichlet distribution, with three components. The unconstrained vector representation would have two elements. However, these two elements do not correspond to specific components of the original sample, since all three components are interdependent (they must sum to one). In such cases, this function should return a vector of two nothings.
However, for a distribution like MvNormal, this function would return a vector of [@opticof(_[1]), @opticof(_[2]), @opticof(_[3])], similar to optic_vec. That is because the first element of the unconstrained vector is solely determined by the first component of the original sample, the second element by the second component, and so on.
Note that, unlike optic_vec, the first element of the linked vector does not necessarily have to be exactly equal to the first component of the original sample, as there may have been a transformation applied. It merely needs to be determined by the first component (and only the first component).
Examples
In this case since linking does not affect the 'provenance' of the sample (i.e., for the product distribution the first element of the linked vector is still determined by .a and the second by .b), the return value of linked_optic_vec is the same as that of optic_vec.
julia> using Bijectors.VectorBijectors: linked_optic_vec; using Distributions
julia> d = Beta(2, 2); linked_optic_vec(d)
1-element Vector{AbstractPPL.Iden}:
Optic()
julia> d = product_distribution((a = Normal(), b = Beta(2, 2))); linked_optic_vec(d)
2-element Vector{AbstractPPL.Property{_A, AbstractPPL.Iden} where _A}:
Optic(.a)
Optic(.b)In practice, if your distribution is a univariate distribution, you will probably only need to implement scalar_to_scalar_bijector (see below).
For multivariate and matrix distributions, there are default implementations of the non-linked versions (i.e., from_vec, to_vec, vec_length, and optic_vec) which should already be optimal. However you will have to define the linked versions. The process of implementing from_linked_vec and to_linked_vec for a distribution is very similar to the process of implementing Bijectors.bijector, so you can consult the existing documentation for a guide on this.
If you have a very customised distribution, you will likely have to implement all the functions yourself.
Known constant bijectors
Additionally, if your distribution is likely to be used in part of a product distribution, it can lead to substantial performance improvements to overload has_constant_vec_bijector to return true (but make sure to only do this if the bijector is genuinely constant!):
Bijectors.VectorBijectors.has_constant_vec_bijector — Function
Bijectors.VectorBijectors.has_constant_vec_bijector(::Type{T}) where {T}Return true if the vector bijector for each element of a collection of distributions is determined solely by the type of the distribution, and not by any runtime parameter values.
This is slightly confusing, so is best explained by example. Consider
d = product_distribution(array_of_dists)If it can be inferred from typeof(array_of_dists) that each distribution inside array_of_dists has the same vector bijector, then has_constant_vec_bijector(typeof(array_of_dists)) should return true.
For example, if array_of_dists is a FillArrays.Fill of some distribution type, then we know that each distribution inside is necessarily the same, and so they all have the same vector bijector. Thus, we have that
has_constant_vec_bijector(::Type{<:FillArrays.Fill}) == trueFor generic AbstractArrays or Tuples, this function will recursively call itself on the element type of the array. That means that if a dist::D (where D<:Distribution) has a constant vector bijector, we can simply mark has_constant_vec_bijector(::Type{D}) == true, and that will automatically propagate to arrays of that distribution type.
For example, Beta has a constant vector bijector, because its support is always between 0 and 1, regardless of its parameters, and thus we can wite
has_constant_vec_bijector(::Type{<:Distributions.Beta}) == trueOn the other hand, Uniform does not have a constant vector bijector, because its support depends on its parameters. So, it would be inappropriate (and can lead to wrong results) if we were to write has_constant_vec_bijector(::Type{<:Distributions.Uniform}) == true.
distribution, and not the actual value. This is necessary in order to guarantee type stability.
Univariate distributions
For univariate distributions the default definition is to generate a bijector that inspects the minimum and maximum of the distribution. While this will work correctly, it might not be the most performant. You can manually define the VectorBijectors API for univariate distributions, but it is probably faster to just overload the single function scalar_to_scalar_bijector: everything else will be automatically dervied.
Bijectors.VectorBijectors.scalar_to_scalar_bijector — Function
Bijectors.VectorBijectors.scalar_to_scalar_bijector(d::D.UnivariateDistribution)The VectorBijectors interface is intended to map samples to vectors. However, for univariate distributions, the 'vectorisation' part of this is trivial (we only need to convert a scalar to a vector of length one, and vice versa). Therefore, this function is provided to allow users to specify the 'interesting' part of the transformation, which is the function that maps values to unconstrained space.
Overloading this function for a univariate distribution is sufficient to implement the entire VectorBijectors interface for that distribution.
There are three scalar-to-scalar bijectors that are exported, which should be enough for any univariate distribution:
Bijectors.VectorBijectors.TypedIdentityBijectors.VectorBijectors.LogBijectors.VectorBijectors.Untruncate
If you need a different scalar-to-scalar bijector, please open an issue.
Univariate distributions tend to fall into one of the following categories:
Bijectors.VectorBijectors.TypedIdentity — Type
TypedIdentity <: ScalarToScalarBijectorThe same as identity.
This is the appropriate scalar-to-scalar bijector for univariate distributions which have support over the entire real line. However, note that this can also be used in other contexts apart from univariate distributions. For example, TypedIdentity() is the correct implementation of to_vec and from_vec for MvNormal distributions.
The problem with using identity as a bijector is that ChangesOfVariables.jl defines with_logabsdet_jacobian(identity, x) = (x, zero(eltype(x))), which can fail if eltype(x) is not a number type! Implementing this allows us to shortcircuit that definition and return a sensible result (i.e. a Float64) even if x is not a numeric vector.
Bijectors.VectorBijectors.Log — Type
Log(bound, sign) <: ScalarToScalarBijector
Callable struct, defined such that (l::Log)(x) = log(l.sign * (x - l.bound)). The sign is determined by the sign field.
This is the appropriate scalar-to-scalar bijector for univariate distributions which have a support of [l.bound, ∞) (if l.sign == 1), or (-∞, l.bound] (if l.sign == -1).
Bijectors.VectorBijectors.Untruncate — Type
Untruncate(a, b) <: ScalarToScalarBijector
Callable struct, defined such that (::Untruncate(a, b))(x) maps a scalar x from (a, b) to (-Inf, Inf).
This is the appropriate scalar-to-scalar bijector for distributions which have support over (a, b).
Testing
Because the scope of a vector bijector is very well-defined, there is a well-established testing framework to verify correctness of an implementation (Bijectors.VectorBijectors.test_all()), which you can use in the test suite. This function contains additional keyword arguments to control the exact testing procedure. For example, you can test that the transformations do not cause extra allocations, should you know this to be the case for your bijector (note that this is not always possible).
For more information about generally testing bijectors (and in particular how to test Jacobians for transformations that modify the number of dimensions), see the documentation on examples of defining bijectors.
One of the most tricky parts of testing Bijectors is ensuring that the transforms are compatible with automatic differentiation. This is important for DynamicPPL: we need to be able to compute the gradient of the log-density with respect to (possibly transformed) parameters, which may include the log-abs-det-Jacobian of the transformation. The default AD backends tested are ForwardDiff, ReverseDiff, Mooncake, and Enzyme. It is acceptable to skip tests for a particular backend if there are genuine upstream bugs, especially with ReverseDiff, which is not actively maintained. However where possible it is best to ensure that all backends are supported, and to use @test_broken to mark any known issues with specific backends.