API
NormalizingFlows.AffineCouplingNormalizingFlows.NeuralSplineCouplingNormalizingFlows.NSF_layerNormalizingFlows.RealNVP_layerNormalizingFlows.create_flowNormalizingFlows.elboNormalizingFlows.elbo_batchNormalizingFlows.fnnNormalizingFlows.loglikelihoodNormalizingFlows.nsfNormalizingFlows.optimizeNormalizingFlows.planarflowNormalizingFlows.radialflowNormalizingFlows.realnvpNormalizingFlows.train_flow
Variational Objectives
We provide ELBO (reverse KL) and expected log-likelihood (forward KL). You can also supply your own objective with the signature vo(rng, flow, args...).
Evidence Lower Bound (ELBO)
By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$:
\[\begin{aligned} &\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Reverse KL)}\\ & = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(T_n \circ \cdots \circ T_1(X)\right)\right] \quad \text{(ELBO)} \end{aligned}\]
Reverse KL minimization is typically used for Bayesian computation when only logp is available.
NormalizingFlows.elbo — Functionelbo(flow, logp, xs)
elbo([rng, ] flow, logp, n_samples)Monte Carlo estimates of the ELBO from a batch of samples xs from the reference distribution flow.dist.
Arguments
rng: random number generatorflow: variational distribution to be trained. In particularflow = transformed(q₀, T::Bijectors.Bijector), q₀ is a reference distribution that one can easily sample and compute logpdflogp: log-pdf of the target distribution (not necessarily normalized)xs: samples from reference dist q₀n_samples: number of samples from reference dist q₀
NormalizingFlows.elbo_batch — Functionelbo_batch(flow, logp, xs)
elbo_batch([rng, ] flow, logp, n_samples)Batched ELBO estimates that transforms a matrix of samples (each column represents a single sample) in one call. This is more efficient for invertible neural-network flows (RealNVP/NSF) as it leverages the batched operation of the neural networks.
Inputs
flow::Bijectors.MultivariateTransformedlogp: function returning log-density of targetxsorn_samples: column-wise sample batch or number of samples
Returns
- Scalar estimate of the ELBO
Log-likelihood
By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$:
\[\begin{aligned} & \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\ & = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)} \end{aligned}\]
Forward KL minimization is typically used for generative modeling when samples from p are given.
NormalizingFlows.loglikelihood — Functionloglikelihood(rng, flow::Bijectors.TransformedDistribution, xs::AbstractVecOrMat)Compute the log-likelihood for variational distribution flow at a batch of samples xs from the target distribution p.
Arguments
rng: random number generator (empty argument, only needed to ensure the same signature as other variational objectives)flow: variational distribution to be trained. In particular "flow = transformed(q₀, T::Bijectors.Bijector)", q₀ is a reference distribution that one can easily sample and compute logpdfxs: samples from the target distribution p.
Training Loop
NormalizingFlows.optimize — Functionoptimize(
ad::ADTypes.AbstractADType,
loss,
θ₀::AbstractVector{T},
re,
args...;
kwargs...
)Iteratively updating the parameters θ of the normalizing flow re(θ) by calling grad! and using the given optimiser to compute the steps.
Arguments
ad::ADTypes.AbstractADType: automatic differentiation backendloss: a general loss function θ -> loss(θ, args...) returning a scalar loss value that will be minimisedθ₀::AbstractVector{T}: initial parameters for the loss function (in the context of normalizing flows, it will be the flattened flow parameters)re: reconstruction function that maps the flattened parameters to the normalizing flowargs...: additional arguments forloss(will be set as DI.Constant)
Keyword Arguments
max_iters::Int=10000: maximum number of iterationsoptimiser::Optimisers.AbstractRule=Optimisers.ADAM(): optimiser to compute the stepsshow_progress::Bool=true: whether to show the progress bar. The default information printed in the progress bar is the iteration number, the loss value, and the gradient norm.callback=nothing: callback function with signaturecb(iter, opt_state, re, θ)which returns a dictionary-like object of statistics to be displayed in the progress bar. re and θ are used for reconstructing the normalizing flow in case that user want to further axamine the status of the flow.hasconverged = (iter, opt_stats, re, θ, st) -> false: function that checks whether the training has converged. The default is to always return false.prog=ProgressMeter.Progress( max_iters; desc="Training", barlen=31, showspeed=true, enabled=show_progress ): progress bar configuration
Returns
θ: trained parameters of the normalizing flowopt_stats: statistics of the optimiserst: optimiser state for potential continuation of training
Available Flows
NormalizingFlows.jl provides two commonly used normalizing flows–-RealNVP and Neural Spline Flow (NSF)–-and two simple flows–-Planar Flow and Radial Flow.
RealNVP (Affine Coupling Flow)
These helpers construct commonly used coupling-based flows with sensible defaults.
NormalizingFlows.realnvp — Functionrealnvp(q0, hdims, nlayers; paramtype = Float64)
realnvp(q0; paramtype = Float64)Construct a RealNVP flow by stacking nlayers RealNVP_layer blocks with odd–even masking. The 1-argument variant defaults to 10 layers with hidden sizes [32, 32] per conditioner.
Arguments
q0::Distribution{Multivariate,Continuous}: base distribution (e.g.MvNormal(zeros(d), I)).hdims::AbstractVector{Int}: hidden sizes for the conditioner networks.nlayers::Int: number of stacked RealNVP layers.
Keyword Arguments
paramtype::Type{T} = Float64: parameter element type (useFloat32for GPU friendliness).
Returns
Bijectors.TransformedDistributionrepresenting the RealNVP flow.
Example
q0 = MvNormal(zeros(2), I); flow = realnvp(q0, [64,64], 8)x = rand(flow, 128); lp = logpdf(flow, x)
realnvp(q0; paramtype = Float64)Default constructor: 10 layers, each conditioner uses hidden sizes [32, 32]. Follows a common RealNVP architecture similar to Appendix E of [ASD2020].
NormalizingFlows.RealNVP_layer — FunctionRealNVP_layer(dims, hdims; paramtype = Float64)Construct a single RealNVP layer by composing two AffineCoupling bijectors with complementary odd–even masks.
Arguments
dims::Int: dimensionality of the problem.hdims::AbstractVector{Int}: hidden sizes of the conditioner networks.
Keyword Arguments
paramtype::Type{T} = Float64: parameter element type.
Returns
- A
Bijectors.Bijectorrepresenting the RealNVP layer.
Example
layer = RealNVP_layer(4, [64, 64])y = layer(randn(4, 16))# batched forward
NormalizingFlows.AffineCoupling — TypeAffineCoupling(dim, hdims, mask_idx, paramtype)
AffineCoupling(dim, mask, s, t)Affine coupling bijector used in RealNVP [LJS2017].
Two subnetworks s (log-scale, exponentiated in the forward pass) and t (shift) act on one partition of the input, conditioned on the complementary partition (as defined by mask). For numerical stability, the output of s passes through tanh before exponentiation.
Arguments
dim::Int: total dimensionality of the input.hdims::AbstractVector{Int}: hidden sizes for the conditioner MLPssandt.mask_idx::AbstractVector{Int}: indices of the dimensions to transform. The complement is used as the conditioner input.
Keyword Arguments
paramtype::Type{<:AbstractFloat}: parameter element type (e.g.Float32).
Fields
mask::Bijectors.PartitionMask: partition specification.s::Flux.Chain: conditioner producing log-scales for the transformed block.t::Flux.Chain: conditioner producing shifts for the transformed block.
Notes
- Forward: with
(x₁,x₂,x₃) = partition(mask, x), computey₁ = x₁ .* exp.(s(x₂)) .+ t(x₂). - Log-determinant:
sum(s(x₂))(or columnwise for batched matrices). - All methods support both vectors and column-major batches (matrices).
Neural Spline Flow (NSF)
NormalizingFlows.nsf — Functionnsf(q0, hdims, K, B, nlayers; paramtype = Float64)
nsf(q0; paramtype = Float64)Construct an NSF by stacking nlayers NSF_layer blocks. The one-argument variant defaults to 10 layers with [32, 32] hidden sizes, 10 knots, and boundary 30 (scaled by one(T)).
Arguments
q0::Distribution{Multivariate,Continuous}: base distribution.hdims::AbstractVector{Int}: hidden sizes of the conditioner network.K::Int: spline knots per coordinate.B::AbstractFloat: spline boundary.nlayers::Int: number of NSF layers.
Keyword Arguments
paramtype::Type{T} = Float64: parameter element type.
Returns
Bijectors.TransformedDistributionrepresenting the NSF flow.
Under the hood, nsf relies on the rational quadratic spline function implememented in MonotonicSplines.jl for performance reasons. MonotonicSplines.jl uses KernelAbstractions.jl to support batched operations. Because of this, so far nsf only supports Zygote as the AD type.
Example
q0 = MvNormal(zeros(3), I); flow = nsf(q0, [64,64], 8, 3.0, 6)x = rand(flow, 128); lp = logpdf(flow, x)
NormalizingFlows.NSF_layer — FunctionNSF_layer(dim, hdims, K, B; paramtype = Float64)Build a single Neural Spline Flow (NSF) layer by composing two NeuralSplineCoupling bijectors with complementary odd–even masks.
Arguments
dim::Int: dimensionality of the problem.hdims::AbstractVector{Int}: hidden sizes of the conditioner network.K::Int: number of spline knots.B::AbstractFloat: spline boundary.
Keyword Arguments
paramtype::Type{T} = Float64: parameter element type.
Returns
- A
Bijectors.Bijectorrepresenting the NSF layer.
Example
layer = NSF_layer(4, [64,64], 10, 3.0)y = layer(randn(4, 32))
NormalizingFlows.NeuralSplineCoupling — TypeNeuralSplineCoupling(dim, hdims, K, B, mask_idx, paramtype)
NeuralSplineCoupling(dim, K, n_dims_transformed, B, nn, mask)Neural Rational Quadratic Spline (RQS) coupling bijector [DBMP2019].
A conditioner network takes the unchanged partition as input and outputs the parameters of monotonic rational quadratic splines for the transformed coordinates. Batched inputs (matrices with column vectors) are supported.
Arguments
dim::Int: total input dimension.hdims::AbstractVector{Int}: hidden sizes for the conditioner MLP.K::Int: number of spline knots per transformed coordinate.B::AbstractFloat: boundary/box constraint for spline domain.mask_idx::AbstractVector{Int}: indices of the transformed coordinates.
Keyword Arguments
paramtype::Type{<:AbstractFloat}: parameter element type.
Fields
nn::Flux.Chain: conditioner that outputs all spline params for all transformed dim.mask::Bijectors.PartitionMask: partition specification.
Notes
- Output dimensionality of the conditioner is
(3K - 1) * n_transformed. - For computation performance, we rely on
MonotonicSplines.jl for the building the rational quadratic spline functions.
- See
MonotonicSplines.rqs_forwardandMonotonicSplines.rqs_inversefor forward/inverse
and log-determinant computations.
Planar and Radial Flows
NormalizingFlows.planarflow — Functionplanarflow(q0, nlayers; paramtype = Float64)Construct a Planar Flow by stacking nlayers Bijectors.PlanarLayer blocks on top of a base distribution q0.
Arguments
q0::Distribution{Multivariate,Continuous}: base distribution (e.g.,MvNormal(zeros(d), I)).nlayers::Int: number of planar layers to compose.
Keyword Arguments
paramtype::Type{T} = Float64: parameter element type (useFloat32for GPU friendliness).
Returns
Bijectors.TransformedDistributionrepresenting the planar flow.
Example
q0 = MvNormal(zeros(2), I); flow = planarflow(q0, 10)x = rand(flow, 128); lp = logpdf(flow, x)
NormalizingFlows.radialflow — Functionradialflow(q0, nlayers; paramtype = Float64)Construct a Radial Flow by stacking nlayers Bijectors.RadialLayer blocks on top of a base distribution q0.
Arguments
q0::Distribution{Multivariate,Continuous}: base distribution (e.g.,MvNormal(zeros(d), I)).nlayers::Int: number of radial layers to compose.
Keyword Arguments
paramtype::Type{T} = Float64: parameter element type (useFloat32for GPU friendliness).
Returns
Bijectors.TransformedDistributionrepresenting the radial flow.
Example
q0 = MvNormal(zeros(2), I); flow = radialflow(q0, 6)x = rand(flow); lp = logpdf(flow, x)
Utility Functions
NormalizingFlows.create_flow — Functioncreate_flow(layers, q0)Construct a normalizing flow by composing the provided bijector layers and attaching them to the base distribution q0.
layers: an iterable ofBijectors.Bijectorobjects that are composed in order (left-to-right) via function composition
(for instance, if layers = [l1, l2, l3], the flow will be l1∘l2∘l3(q0)).
q0: the base distribution (e.g.,MvNormal(zeros(d), I)).
Returns a Bijectors.TransformedDistribution representing the resulting flow.
Example
using Distributions, Bijectors, LinearAlgebra
q0 = MvNormal(zeros(2), I)
flow = create_flow((Bijectors.Shift([0.0, 1.0]), Bijectors.Scale([1.0, 2.0])), q0)NormalizingFlows.fnn — Functionfnn(
input_dim::Int,
hidden_dims::AbstractVector{Int},
output_dim::Int;
inlayer_activation=Flux.leakyrelu,
output_activation=nothing,
paramtype::Type{T} = Float64,
)Create a fully connected neural network (FNN).
Arguments
input_dim::Int: The dimension of the input layer.hidden_dims::AbstractVector{<:Int}: A vector of integers specifying the dimensions of the hidden layers.output_dim::Int: The dimension of the output layer.inlayer_activation: The activation function for the hidden layers. Defaults toFlux.leakyrelu.output_activation: The activation function for the output layer. Defaults tonothing.paramtype::Type{T} = Float64: The type of the parameters in the network, defaults toFloat64.
Returns
- A
Flux.Chainrepresenting the FNN.
- ASD2020Agrawal, A., Sheldon, D., Domke, J. (2020). Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization. NeurIPS.
- LJS2017Dinh, L., Sohl-Dickstein, J. and Bengio, S. (2017). Density estimation using Real NVP. ICLR.
- DBMP2019Durkan, C., Bekasov, A., Murray, I. and Papamarkou, T. (2019). Neural Spline Flows. NeurIPS.