API
NormalizingFlows.AffineCoupling
NormalizingFlows.NeuralSplineCoupling
NormalizingFlows.NSF_layer
NormalizingFlows.RealNVP_layer
NormalizingFlows.create_flow
NormalizingFlows.elbo
NormalizingFlows.elbo_batch
NormalizingFlows.fnn
NormalizingFlows.loglikelihood
NormalizingFlows.nsf
NormalizingFlows.optimize
NormalizingFlows.planarflow
NormalizingFlows.radialflow
NormalizingFlows.realnvp
NormalizingFlows.train_flow
Variational Objectives
We provide ELBO (reverse KL) and expected log-likelihood (forward KL). You can also supply your own objective with the signature vo(rng, flow, args...)
.
Evidence Lower Bound (ELBO)
By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$:
\[\begin{aligned} &\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Reverse KL)}\\ & = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(T_n \circ \cdots \circ T_1(X)\right)\right] \quad \text{(ELBO)} \end{aligned}\]
Reverse KL minimization is typically used for Bayesian computation when only logp
is available.
NormalizingFlows.elbo
— Functionelbo(flow, logp, xs)
elbo([rng, ] flow, logp, n_samples)
Monte Carlo estimates of the ELBO from a batch of samples xs
from the reference distribution flow.dist
.
Arguments
rng
: random number generatorflow
: variational distribution to be trained. In particularflow = transformed(q₀, T::Bijectors.Bijector)
, q₀ is a reference distribution that one can easily sample and compute logpdflogp
: log-pdf of the target distribution (not necessarily normalized)xs
: samples from reference dist q₀n_samples
: number of samples from reference dist q₀
NormalizingFlows.elbo_batch
— Functionelbo_batch(flow, logp, xs)
elbo_batch([rng, ] flow, logp, n_samples)
Batched ELBO estimates that transforms a matrix of samples (each column represents a single sample) in one call. This is more efficient for invertible neural-network flows (RealNVP/NSF) as it leverages the batched operation of the neural networks.
Inputs
flow::Bijectors.MultivariateTransformed
logp
: function returning log-density of targetxs
orn_samples
: column-wise sample batch or number of samples
Returns
- Scalar estimate of the ELBO
Log-likelihood
By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$:
\[\begin{aligned} & \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\ & = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)} \end{aligned}\]
Forward KL minimization is typically used for generative modeling when samples from p
are given.
NormalizingFlows.loglikelihood
— Functionloglikelihood(rng, flow::Bijectors.TransformedDistribution, xs::AbstractVecOrMat)
Compute the log-likelihood for variational distribution flow at a batch of samples xs from the target distribution p.
Arguments
rng
: random number generator (empty argument, only needed to ensure the same signature as other variational objectives)flow
: variational distribution to be trained. In particular "flow = transformed(q₀, T::Bijectors.Bijector)", q₀ is a reference distribution that one can easily sample and compute logpdfxs
: samples from the target distribution p.
Training Loop
NormalizingFlows.optimize
— Functionoptimize(
ad::ADTypes.AbstractADType,
loss,
θ₀::AbstractVector{T},
re,
args...;
kwargs...
)
Iteratively updating the parameters θ
of the normalizing flow re(θ)
by calling grad!
and using the given optimiser
to compute the steps.
Arguments
ad::ADTypes.AbstractADType
: automatic differentiation backendloss
: a general loss function θ -> loss(θ, args...) returning a scalar loss value that will be minimisedθ₀::AbstractVector{T}
: initial parameters for the loss function (in the context of normalizing flows, it will be the flattened flow parameters)re
: reconstruction function that maps the flattened parameters to the normalizing flowargs...
: additional arguments forloss
(will be set as DI.Constant)
Keyword Arguments
max_iters::Int=10000
: maximum number of iterationsoptimiser::Optimisers.AbstractRule=Optimisers.ADAM()
: optimiser to compute the stepsshow_progress::Bool=true
: whether to show the progress bar. The default information printed in the progress bar is the iteration number, the loss value, and the gradient norm.callback=nothing
: callback function with signaturecb(iter, opt_state, re, θ)
which returns a dictionary-like object of statistics to be displayed in the progress bar. re and θ are used for reconstructing the normalizing flow in case that user want to further axamine the status of the flow.hasconverged = (iter, opt_stats, re, θ, st) -> false
: function that checks whether the training has converged. The default is to always return false.prog=ProgressMeter.Progress( max_iters; desc="Training", barlen=31, showspeed=true, enabled=show_progress )
: progress bar configuration
Returns
θ
: trained parameters of the normalizing flowopt_stats
: statistics of the optimiserst
: optimiser state for potential continuation of training
Available Flows
NormalizingFlows.jl
provides two commonly used normalizing flows–-RealNVP
and Neural Spline Flow (NSF)
–-and two simple flows–-Planar Flow
and Radial Flow
.
RealNVP (Affine Coupling Flow)
These helpers construct commonly used coupling-based flows with sensible defaults.
NormalizingFlows.realnvp
— Functionrealnvp(q0, hdims, nlayers; paramtype = Float64)
realnvp(q0; paramtype = Float64)
Construct a RealNVP flow by stacking nlayers
RealNVP_layer
blocks with odd–even masking. The 1-argument variant defaults to 10 layers with hidden sizes [32, 32]
per conditioner.
Arguments
q0::Distribution{Multivariate,Continuous}
: base distribution (e.g.MvNormal(zeros(d), I)
).hdims::AbstractVector{Int}
: hidden sizes for the conditioner networks.nlayers::Int
: number of stacked RealNVP layers.
Keyword Arguments
paramtype::Type{T} = Float64
: parameter element type (useFloat32
for GPU friendliness).
Returns
Bijectors.TransformedDistribution
representing the RealNVP flow.
Example
q0 = MvNormal(zeros(2), I); flow = realnvp(q0, [64,64], 8)
x = rand(flow, 128); lp = logpdf(flow, x)
realnvp(q0; paramtype = Float64)
Default constructor: 10 layers, each conditioner uses hidden sizes [32, 32]
. Follows a common RealNVP architecture similar to Appendix E of [ASD2020].
NormalizingFlows.RealNVP_layer
— FunctionRealNVP_layer(dims, hdims; paramtype = Float64)
Construct a single RealNVP layer by composing two AffineCoupling
bijectors with complementary odd–even masks.
Arguments
dims::Int
: dimensionality of the problem.hdims::AbstractVector{Int}
: hidden sizes of the conditioner networks.
Keyword Arguments
paramtype::Type{T} = Float64
: parameter element type.
Returns
- A
Bijectors.Bijector
representing the RealNVP layer.
Example
layer = RealNVP_layer(4, [64, 64])
y = layer(randn(4, 16))
# batched forward
NormalizingFlows.AffineCoupling
— TypeAffineCoupling(dim, hdims, mask_idx, paramtype)
AffineCoupling(dim, mask, s, t)
Affine coupling bijector used in RealNVP [LJS2017].
Two subnetworks s
(log-scale, exponentiated in the forward pass) and t
(shift) act on one partition of the input, conditioned on the complementary partition (as defined by mask
). For numerical stability, the output of s
passes through tanh
before exponentiation.
Arguments
dim::Int
: total dimensionality of the input.hdims::AbstractVector{Int}
: hidden sizes for the conditioner MLPss
andt
.mask_idx::AbstractVector{Int}
: indices of the dimensions to transform. The complement is used as the conditioner input.
Keyword Arguments
paramtype::Type{<:AbstractFloat}
: parameter element type (e.g.Float32
).
Fields
mask::Bijectors.PartitionMask
: partition specification.s::Flux.Chain
: conditioner producing log-scales for the transformed block.t::Flux.Chain
: conditioner producing shifts for the transformed block.
Notes
- Forward: with
(x₁,x₂,x₃) = partition(mask, x)
, computey₁ = x₁ .* exp.(s(x₂)) .+ t(x₂)
. - Log-determinant:
sum(s(x₂))
(or columnwise for batched matrices). - All methods support both vectors and column-major batches (matrices).
Neural Spline Flow (NSF)
NormalizingFlows.nsf
— Functionnsf(q0, hdims, K, B, nlayers; paramtype = Float64)
nsf(q0; paramtype = Float64)
Construct an NSF by stacking nlayers
NSF_layer
blocks. The one-argument variant defaults to 10 layers with [32, 32]
hidden sizes, 10 knots, and boundary 30
(scaled by one(T)
).
Arguments
q0::Distribution{Multivariate,Continuous}
: base distribution.hdims::AbstractVector{Int}
: hidden sizes of the conditioner network.K::Int
: spline knots per coordinate.B::AbstractFloat
: spline boundary.nlayers::Int
: number of NSF layers.
Keyword Arguments
paramtype::Type{T} = Float64
: parameter element type.
Returns
Bijectors.TransformedDistribution
representing the NSF flow.
Under the hood, nsf
relies on the rational quadratic spline function implememented in MonotonicSplines.jl
for performance reasons. MonotonicSplines.jl
uses KernelAbstractions.jl
to support batched operations. Because of this, so far nsf
only supports Zygote
as the AD type.
Example
q0 = MvNormal(zeros(3), I); flow = nsf(q0, [64,64], 8, 3.0, 6)
x = rand(flow, 128); lp = logpdf(flow, x)
NormalizingFlows.NSF_layer
— FunctionNSF_layer(dim, hdims, K, B; paramtype = Float64)
Build a single Neural Spline Flow (NSF) layer by composing two NeuralSplineCoupling
bijectors with complementary odd–even masks.
Arguments
dim::Int
: dimensionality of the problem.hdims::AbstractVector{Int}
: hidden sizes of the conditioner network.K::Int
: number of spline knots.B::AbstractFloat
: spline boundary.
Keyword Arguments
paramtype::Type{T} = Float64
: parameter element type.
Returns
- A
Bijectors.Bijector
representing the NSF layer.
Example
layer = NSF_layer(4, [64,64], 10, 3.0)
y = layer(randn(4, 32))
NormalizingFlows.NeuralSplineCoupling
— TypeNeuralSplineCoupling(dim, hdims, K, B, mask_idx, paramtype)
NeuralSplineCoupling(dim, K, n_dims_transformed, B, nn, mask)
Neural Rational Quadratic Spline (RQS) coupling bijector [DBMP2019].
A conditioner network takes the unchanged partition as input and outputs the parameters of monotonic rational quadratic splines for the transformed coordinates. Batched inputs (matrices with column vectors) are supported.
Arguments
dim::Int
: total input dimension.hdims::AbstractVector{Int}
: hidden sizes for the conditioner MLP.K::Int
: number of spline knots per transformed coordinate.B::AbstractFloat
: boundary/box constraint for spline domain.mask_idx::AbstractVector{Int}
: indices of the transformed coordinates.
Keyword Arguments
paramtype::Type{<:AbstractFloat}
: parameter element type.
Fields
nn::Flux.Chain
: conditioner that outputs all spline params for all transformed dim.mask::Bijectors.PartitionMask
: partition specification.
Notes
- Output dimensionality of the conditioner is
(3K - 1) * n_transformed
. - For computation performance, we rely on
MonotonicSplines.jl
for the building the rational quadratic spline functions.
- See
MonotonicSplines.rqs_forward
andMonotonicSplines.rqs_inverse
for forward/inverse
and log-determinant computations.
Planar and Radial Flows
NormalizingFlows.planarflow
— Functionplanarflow(q0, nlayers; paramtype = Float64)
Construct a Planar Flow by stacking nlayers
Bijectors.PlanarLayer
blocks on top of a base distribution q0
.
Arguments
q0::Distribution{Multivariate,Continuous}
: base distribution (e.g.,MvNormal(zeros(d), I)
).nlayers::Int
: number of planar layers to compose.
Keyword Arguments
paramtype::Type{T} = Float64
: parameter element type (useFloat32
for GPU friendliness).
Returns
Bijectors.TransformedDistribution
representing the planar flow.
Example
q0 = MvNormal(zeros(2), I); flow = planarflow(q0, 10)
x = rand(flow, 128); lp = logpdf(flow, x)
NormalizingFlows.radialflow
— Functionradialflow(q0, nlayers; paramtype = Float64)
Construct a Radial Flow by stacking nlayers
Bijectors.RadialLayer
blocks on top of a base distribution q0
.
Arguments
q0::Distribution{Multivariate,Continuous}
: base distribution (e.g.,MvNormal(zeros(d), I)
).nlayers::Int
: number of radial layers to compose.
Keyword Arguments
paramtype::Type{T} = Float64
: parameter element type (useFloat32
for GPU friendliness).
Returns
Bijectors.TransformedDistribution
representing the radial flow.
Example
q0 = MvNormal(zeros(2), I); flow = radialflow(q0, 6)
x = rand(flow); lp = logpdf(flow, x)
Utility Functions
NormalizingFlows.create_flow
— Functioncreate_flow(layers, q0)
Construct a normalizing flow by composing the provided bijector layers and attaching them to the base distribution q0
.
layers
: an iterable ofBijectors.Bijector
objects that are composed in order (left-to-right) via function composition
(for instance, if layers = [l1, l2, l3]
, the flow will be l1∘l2∘l3(q0)
).
q0
: the base distribution (e.g.,MvNormal(zeros(d), I)
).
Returns a Bijectors.TransformedDistribution
representing the resulting flow.
Example
using Distributions, Bijectors, LinearAlgebra
q0 = MvNormal(zeros(2), I)
flow = create_flow((Bijectors.Shift([0.0, 1.0]), Bijectors.Scale([1.0, 2.0])), q0)
NormalizingFlows.fnn
— Functionfnn(
input_dim::Int,
hidden_dims::AbstractVector{Int},
output_dim::Int;
inlayer_activation=Flux.leakyrelu,
output_activation=nothing,
paramtype::Type{T} = Float64,
)
Create a fully connected neural network (FNN).
Arguments
input_dim::Int
: The dimension of the input layer.hidden_dims::AbstractVector{<:Int}
: A vector of integers specifying the dimensions of the hidden layers.output_dim::Int
: The dimension of the output layer.inlayer_activation
: The activation function for the hidden layers. Defaults toFlux.leakyrelu
.output_activation
: The activation function for the output layer. Defaults tonothing
.paramtype::Type{T} = Float64
: The type of the parameters in the network, defaults toFloat64
.
Returns
- A
Flux.Chain
representing the FNN.
- ASD2020Agrawal, A., Sheldon, D., Domke, J. (2020). Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization. NeurIPS.
- LJS2017Dinh, L., Sohl-Dickstein, J. and Bengio, S. (2017). Density estimation using Real NVP. ICLR.
- DBMP2019Durkan, C., Bekasov, A., Murray, I. and Papamarkou, T. (2019). Neural Spline Flows. NeurIPS.