Univariate ADVI example
But the real utility of TransformedDistribution becomes more apparent when using transformed(dist, b) for any bijector b. To get the transformed distribution corresponding to the Beta(2, 2), we called transformed(dist) before. This is simply an alias for transformed(dist, bijector(dist)). Remember bijector(dist) returns the constrained-to-constrained bijector for that particular Distribution. But we can of course construct a TransformedDistribution using different bijectors with the same dist. This is particularly useful in something called Automatic Differentiation Variational Inference (ADVI).[2] An important part of ADVI is to approximate a constrained distribution, e.g. Beta, as follows:
- Sample
xfrom aNormalwith parametersμandσ, i.e.x ~ Normal(μ, σ). - Transform
xtoys.t.y ∈ support(Beta), with the transform being a differentiable bijection with a differentiable inverse (a "bijector")
This then defines a probability density with same support as Beta! Of course, it's unlikely that it will be the same density, but it's an approximation. Creating such a distribution becomes trivial with Bijector and TransformedDistribution:
julia> using StableRNGs: StableRNGjulia> rng = StableRNG(42);julia> dist = Beta(2, 2)Beta{Float64}(α=2.0, β=2.0)julia> b = bijector(dist) # (0, 1) → ℝBijectors.Logit{Float64, Float64}(0.0, 1.0)julia> b⁻¹ = inverse(b) # ℝ → (0, 1)Inverse{Bijectors.Logit{Float64, Float64}}(Bijectors.Logit{Float64, Float64}(0.0, 1.0))julia> td = transformed(Normal(), b⁻¹) # x ∼ 𝓝(0, 1) then b(x) ∈ (0, 1)UnivariateTransformed{Normal{Float64}, Inverse{Bijectors.Logit{Float64, Float64}}}( dist: Normal{Float64}(μ=0.0, σ=1.0) transform: Inverse{Bijectors.Logit{Float64, Float64}}(Bijectors.Logit{Float64, Float64}(0.0, 1.0)) )julia> x = rand(rng, td) # ∈ (0, 1)0.3384404850130036
It's worth noting that support(Beta) is the closed interval [0, 1], while the constrained-to-unconstrained bijection, Logit in this case, is only well-defined as a map (0, 1) → ℝ for the open interval (0, 1). This is of course not an implementation detail. ℝ is itself open, thus no continuous bijection exists from a closed interval to ℝ. But since the boundaries of a closed interval has what's known as measure zero, this doesn't end up affecting the resulting density with support on the entire real line. In practice, this means that
julia> td = transformed(Beta())UnivariateTransformed{Beta{Float64}, Bijectors.Logit{Float64, Float64}}( dist: Beta{Float64}(α=1.0, β=1.0) transform: Bijectors.Logit{Float64, Float64}(0.0, 1.0) )julia> inverse(td.transform)(rand(rng, td))0.8130302707446476
will never result in 0 or 1 though any sample arbitrarily close to either 0 or 1 is possible. Disclaimer: numerical accuracy is limited, so you might still see 0 and 1 if you're lucky.
Multivariate ADVI example
We can also do multivariate ADVI using the Stacked bijector. Stacked gives us a way to combine univariate and/or multivariate bijectors into a singe multivariate bijector. Say you have a vector x of length 2 and you want to transform the first entry using Exp and the second entry using Log. Stacked gives you an easy and efficient way of representing such a bijector.
julia> using Bijectors: SimplexBijectorjulia> # Original distributions dists = (Beta(), InverseGamma(), Dirichlet(2, 3));julia> # Construct the corresponding ranges ranges = [];julia> idx = 1;julia> for i in 1:length(dists) d = dists[i] push!(ranges, idx:(idx + length(d) - 1)) global idx idx += length(d) end;julia> ranges3-element Vector{Any}: 1:1 2:2 3:4julia> # Base distribution; mean-field normal num_params = ranges[end][end]4julia> d = MvNormal(zeros(num_params), ones(num_params));julia> # Construct the transform bs = bijector.(dists); # constrained-to-unconstrained bijectors for distsjulia> ibs = inverse.(bs); # invert, so we get unconstrained-to-constrainedjulia> sb = Stacked(ibs, ranges) # => Stacked <: BijectorStacked(Any[Inverse{Bijectors.Logit{Float64, Float64}}(Bijectors.Logit{Float64, Float64}(0.0, 1.0)), Base.Fix1{typeof(broadcast), typeof(exp)}(broadcast, exp), Inverse{Bijectors.SimplexBijector}(Bijectors.SimplexBijector())], Any[1:1, 2:2, 3:4], Any[1:1, 2:2, 3:5])julia> # Mean-field normal with unconstrained-to-constrained stacked bijector td = transformed(d, sb);julia> y = rand(td)5-element Vector{Float64}: 0.6399741440761801 0.8059415714116372 0.5608685805851804 0.09789539680401774 0.3412360226108019julia> 0.0 ≤ y[1] ≤ 1.0truejulia> 0.0 < y[2]truejulia> sum(y[3:4]) ≈ 1.0false
Normalizing flows
A very interesting application is that of normalizing flows.[1] Usually this is done by sampling from a multivariate normal distribution, and then transforming this to a target distribution using invertible neural networks. Currently there are two such transforms available in Bijectors.jl: PlanarLayer and RadialLayer. Let's create a flow with a single PlanarLayer:
julia> d = MvNormal(zeros(2), ones(2));julia> b = PlanarLayer(2)PlanarLayer(w = [-0.9233231958252519, 0.028508813856836106], u = [-0.17225650494742376, -0.8106515054965104], b = [0.15929467168264638])julia> flow = transformed(d, b)MultivariateTransformed{DiagNormal, PlanarLayer{Vector{Float64}, Vector{Float64}}}( dist: DiagNormal( dim: 2 μ: [0.0, 0.0] Σ: [1.0 0.0; 0.0 1.0] ) transform: PlanarLayer(w = [-0.9233231958252519, 0.028508813856836106], u = [-0.17225650494742376, -0.8106515054965104], b = [0.15929467168264638]) )julia> flow isa MultivariateDistributiontrue
That's it. Now we can sample from it using rand and compute the logpdf, like any other Distribution.
julia> y = rand(rng, flow)2-element Vector{Float64}: -0.5181679679831352 -0.09523207615860063julia> logpdf(flow, y) # uses inverse of `b`-2.0187451158098595
Similarily to the multivariate ADVI example, we could use Stacked to get a bounded flow:
julia> d = MvNormal(zeros(2), ones(2));julia> ibs = inverse.(bijector.((InverseGamma(2, 3), Beta())));julia> sb = Stacked(ibs) # == Stacked(ibs, [i:i for i = 1:length(ibs)]Stacked((Base.Fix1{typeof(broadcast), typeof(exp)}(broadcast, exp), Inverse{Bijectors.Logit{Float64, Float64}}(Bijectors.Logit{Float64, Float64}(0.0, 1.0))), (1:1, 2:2), (1:1, 2:2))julia> b = sb ∘ PlanarLayer(2)Stacked((Base.Fix1{typeof(broadcast), typeof(exp)}(broadcast, exp), Inverse{Bijectors.Logit{Float64, Float64}}(Bijectors.Logit{Float64, Float64}(0.0, 1.0))), (1:1, 2:2), (1:1, 2:2)) ∘ PlanarLayer(w = [-2.0420178117894436, -0.5751455428024244], u = [-1.1894286052005525, -0.2635690881899048], b = [-0.3742321477120745])julia> td = transformed(d, b);julia> y = rand(rng, td)2-element Vector{Float64}: 8.515405160013982 0.8106954081747914julia> 0 < y[1]truejulia> 0 ≤ y[2] ≤ 1true
Want to fit the flow?
julia> using ForwardDiffERROR: ArgumentError: Package ForwardDiff not found in current path. - Run `import Pkg; Pkg.add("ForwardDiff")` to install the ForwardDiff package.julia> # Construct the flow. b = PlanarLayer(2)PlanarLayer(w = [0.6994992759185654, 0.7597515376631666], u = [-0.5570205726798781, -0.10463729033662539], b = [-0.26754152824754046])julia> # Convenient for extracting parameters and reconstructing the flow. using Functorsjulia> θs, reconstruct = Functors.functor(b);julia> # Make the objective a `struct` to avoid capturing global variables. struct NLLObjective{R,D,T} reconstruct::R basedist::D data::T endjulia> function (obj::NLLObjective)(θs) transformed_dist = transformed(obj.basedist, obj.reconstruct(θs)) return -sum(Base.Fix1(logpdf, transformed_dist), eachcol(obj.data)) endjulia> # Some random data to estimate the density of. xs = randn(2, 1000);julia> # Construct the objective. f = NLLObjective(reconstruct, MvNormal(2, 1), xs);julia> # Initial loss. @info "Initial loss: $(f(θs))"[ Info: Initial loss: 3101.8882911491564julia> # Train using gradient descent. ε = 1e-3;julia> for i in 1:100 ∇s = ForwardDiff.gradient(θ -> f(θ), θs) θs = fmap(θs, ∇s) do θ, ∇ θ - ε .* ∇ end endERROR: UndefVarError: `ForwardDiff` not defined in `Main.var"Main"` Suggestion: check for spelling errors or missing imports.julia> # Final loss @info "Final loss: $(f(θs))"[ Info: Final loss: 3101.8882911491564julia> # Very simple check to see if we learned something useful. samples = rand(transformed(f.basedist, f.reconstruct(θs)), 1000);julia> mean(eachcol(samples)) # ≈ [0, 0]2-element Vector{Float64}: 0.12813278842979822 0.04326164829578386julia> cov(samples; dims=2) # ≈ I2×2 Matrix{Float64}: 0.632925 -0.287148 -0.287148 0.846646
We can easily create more complex flows by simply doing PlanarLayer(10) ∘ PlanarLayer(10) ∘ RadialLayer(10) and so on.