API

Variational Objectives

We provide ELBO (reverse KL) and expected log-likelihood (forward KL). You can also supply your own objective with the signature vo(rng, flow, args...).

Evidence Lower Bound (ELBO)

By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$:

\[\begin{aligned} &\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Reverse KL)}\\ & = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(T_n \circ \cdots \circ T_1(X)\right)\right] \quad \text{(ELBO)} \end{aligned}\]

Reverse KL minimization is typically used for Bayesian computation when only logp is available.

NormalizingFlows.elboFunction
elbo(flow, logp, xs)
elbo([rng, ] flow, logp, n_samples)

Monte Carlo estimates of the ELBO from a batch of samples xs from the reference distribution flow.dist.

Arguments

  • rng: random number generator
  • flow: variational distribution to be trained. In particular flow = transformed(q₀, T::Bijectors.Bijector), q₀ is a reference distribution that one can easily sample and compute logpdf
  • logp: log-pdf of the target distribution (not necessarily normalized)
  • xs: samples from reference dist q₀
  • n_samples: number of samples from reference dist q₀
source
NormalizingFlows.elbo_batchFunction
elbo_batch(flow, logp, xs)
elbo_batch([rng, ] flow, logp, n_samples)

Batched ELBO estimates that transforms a matrix of samples (each column represents a single sample) in one call. This is more efficient for invertible neural-network flows (RealNVP/NSF) as it leverages the batched operation of the neural networks.

Inputs

  • flow::Bijectors.MultivariateTransformed
  • logp: function returning log-density of target
  • xs or n_samples: column-wise sample batch or number of samples

Returns

  • Scalar estimate of the ELBO
source

Log-likelihood

By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$:

\[\begin{aligned} & \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\ & = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)} \end{aligned}\]

Forward KL minimization is typically used for generative modeling when samples from p are given.

NormalizingFlows.loglikelihoodFunction
loglikelihood(rng, flow::Bijectors.TransformedDistribution, xs::AbstractVecOrMat)

Compute the log-likelihood for variational distribution flow at a batch of samples xs from the target distribution p.

Arguments

  • rng: random number generator (empty argument, only needed to ensure the same signature as other variational objectives)
  • flow: variational distribution to be trained. In particular "flow = transformed(q₀, T::Bijectors.Bijector)", q₀ is a reference distribution that one can easily sample and compute logpdf
  • xs: samples from the target distribution p.
source

Training Loop

NormalizingFlows.optimizeFunction
optimize(
    ad::ADTypes.AbstractADType, 
    loss, 
    θ₀::AbstractVector{T}, 
    re, 
    args...; 
    kwargs...
)

Iteratively updating the parameters θ of the normalizing flow re(θ) by calling grad! and using the given optimiser to compute the steps.

Arguments

  • ad::ADTypes.AbstractADType: automatic differentiation backend
  • loss: a general loss function θ -> loss(θ, args...) returning a scalar loss value that will be minimised
  • θ₀::AbstractVector{T}: initial parameters for the loss function (in the context of normalizing flows, it will be the flattened flow parameters)
  • re: reconstruction function that maps the flattened parameters to the normalizing flow
  • args...: additional arguments for loss (will be set as DI.Constant)

Keyword Arguments

  • max_iters::Int=10000: maximum number of iterations
  • optimiser::Optimisers.AbstractRule=Optimisers.ADAM(): optimiser to compute the steps
  • show_progress::Bool=true: whether to show the progress bar. The default information printed in the progress bar is the iteration number, the loss value, and the gradient norm.
  • callback=nothing: callback function with signature cb(iter, opt_state, re, θ) which returns a dictionary-like object of statistics to be displayed in the progress bar. re and θ are used for reconstructing the normalizing flow in case that user want to further axamine the status of the flow.
  • hasconverged = (iter, opt_stats, re, θ, st) -> false: function that checks whether the training has converged. The default is to always return false.
  • prog=ProgressMeter.Progress( max_iters; desc="Training", barlen=31, showspeed=true, enabled=show_progress ): progress bar configuration

Returns

  • θ: trained parameters of the normalizing flow
  • opt_stats: statistics of the optimiser
  • st: optimiser state for potential continuation of training
source

Available Flows

NormalizingFlows.jl provides two commonly used normalizing flows–-RealNVP and Neural Spline Flow (NSF)–-and two simple flows–-Planar Flow and Radial Flow.

RealNVP (Affine Coupling Flow)

These helpers construct commonly used coupling-based flows with sensible defaults.

NormalizingFlows.realnvpFunction
realnvp(q0, hdims, nlayers; paramtype = Float64)
realnvp(q0; paramtype = Float64)

Construct a RealNVP flow by stacking nlayers RealNVP_layer blocks with odd–even masking. The 1-argument variant defaults to 10 layers with hidden sizes [32, 32] per conditioner.

Arguments

  • q0::Distribution{Multivariate,Continuous}: base distribution (e.g. MvNormal(zeros(d), I)).
  • hdims::AbstractVector{Int}: hidden sizes for the conditioner networks.
  • nlayers::Int: number of stacked RealNVP layers.

Keyword Arguments

  • paramtype::Type{T} = Float64: parameter element type (use Float32 for GPU friendliness).

Returns

  • Bijectors.TransformedDistribution representing the RealNVP flow.

Example

  • q0 = MvNormal(zeros(2), I); flow = realnvp(q0, [64,64], 8)
  • x = rand(flow, 128); lp = logpdf(flow, x)
source
realnvp(q0; paramtype = Float64)

Default constructor: 10 layers, each conditioner uses hidden sizes [32, 32]. Follows a common RealNVP architecture similar to Appendix E of [ASD2020].

source
NormalizingFlows.RealNVP_layerFunction
RealNVP_layer(dims, hdims; paramtype = Float64)

Construct a single RealNVP layer by composing two AffineCoupling bijectors with complementary odd–even masks.

Arguments

  • dims::Int: dimensionality of the problem.
  • hdims::AbstractVector{Int}: hidden sizes of the conditioner networks.

Keyword Arguments

  • paramtype::Type{T} = Float64: parameter element type.

Returns

  • A Bijectors.Bijector representing the RealNVP layer.

Example

  • layer = RealNVP_layer(4, [64, 64])
  • y = layer(randn(4, 16)) # batched forward
source
NormalizingFlows.AffineCouplingType
AffineCoupling(dim, hdims, mask_idx, paramtype)
AffineCoupling(dim, mask, s, t)

Affine coupling bijector used in RealNVP [LJS2017].

Two subnetworks s (log-scale, exponentiated in the forward pass) and t (shift) act on one partition of the input, conditioned on the complementary partition (as defined by mask). For numerical stability, the output of s passes through tanh before exponentiation.

Arguments

  • dim::Int: total dimensionality of the input.
  • hdims::AbstractVector{Int}: hidden sizes for the conditioner MLPs s and t.
  • mask_idx::AbstractVector{Int}: indices of the dimensions to transform. The complement is used as the conditioner input.

Keyword Arguments

  • paramtype::Type{<:AbstractFloat}: parameter element type (e.g. Float32).

Fields

  • mask::Bijectors.PartitionMask: partition specification.
  • s::Flux.Chain: conditioner producing log-scales for the transformed block.
  • t::Flux.Chain: conditioner producing shifts for the transformed block.

Notes

  • Forward: with (x₁,x₂,x₃) = partition(mask, x), compute y₁ = x₁ .* exp.(s(x₂)) .+ t(x₂).
  • Log-determinant: sum(s(x₂)) (or columnwise for batched matrices).
  • All methods support both vectors and column-major batches (matrices).
source

Neural Spline Flow (NSF)

NormalizingFlows.nsfFunction
nsf(q0, hdims, K, B, nlayers; paramtype = Float64)
nsf(q0; paramtype = Float64)

Construct an NSF by stacking nlayers NSF_layer blocks. The one-argument variant defaults to 10 layers with [32, 32] hidden sizes, 10 knots, and boundary 30 (scaled by one(T)).

Arguments

  • q0::Distribution{Multivariate,Continuous}: base distribution.
  • hdims::AbstractVector{Int}: hidden sizes of the conditioner network.
  • K::Int: spline knots per coordinate.
  • B::AbstractFloat: spline boundary.
  • nlayers::Int: number of NSF layers.

Keyword Arguments

  • paramtype::Type{T} = Float64: parameter element type.

Returns

  • Bijectors.TransformedDistribution representing the NSF flow.
Note

Under the hood, nsf relies on the rational quadratic spline function implememented in MonotonicSplines.jl for performance reasons. MonotonicSplines.jl uses KernelAbstractions.jl to support batched operations. Because of this, so far nsf only supports Zygote as the AD type.

Example

  • q0 = MvNormal(zeros(3), I); flow = nsf(q0, [64,64], 8, 3.0, 6)
  • x = rand(flow, 128); lp = logpdf(flow, x)
source
NormalizingFlows.NSF_layerFunction
NSF_layer(dim, hdims, K, B; paramtype = Float64)

Build a single Neural Spline Flow (NSF) layer by composing two NeuralSplineCoupling bijectors with complementary odd–even masks.

Arguments

  • dim::Int: dimensionality of the problem.
  • hdims::AbstractVector{Int}: hidden sizes of the conditioner network.
  • K::Int: number of spline knots.
  • B::AbstractFloat: spline boundary.

Keyword Arguments

  • paramtype::Type{T} = Float64: parameter element type.

Returns

  • A Bijectors.Bijector representing the NSF layer.

Example

  • layer = NSF_layer(4, [64,64], 10, 3.0)
  • y = layer(randn(4, 32))
source
NormalizingFlows.NeuralSplineCouplingType
NeuralSplineCoupling(dim, hdims, K, B, mask_idx, paramtype)
NeuralSplineCoupling(dim, K, n_dims_transformed, B, nn, mask)

Neural Rational Quadratic Spline (RQS) coupling bijector [DBMP2019].

A conditioner network takes the unchanged partition as input and outputs the parameters of monotonic rational quadratic splines for the transformed coordinates. Batched inputs (matrices with column vectors) are supported.

Arguments

  • dim::Int: total input dimension.
  • hdims::AbstractVector{Int}: hidden sizes for the conditioner MLP.
  • K::Int: number of spline knots per transformed coordinate.
  • B::AbstractFloat: boundary/box constraint for spline domain.
  • mask_idx::AbstractVector{Int}: indices of the transformed coordinates.

Keyword Arguments

  • paramtype::Type{<:AbstractFloat}: parameter element type.

Fields

  • nn::Flux.Chain: conditioner that outputs all spline params for all transformed dim.
  • mask::Bijectors.PartitionMask: partition specification.

Notes

  • Output dimensionality of the conditioner is (3K - 1) * n_transformed.
  • For computation performance, we rely on

MonotonicSplines.jl for the building the rational quadratic spline functions.

  • See MonotonicSplines.rqs_forward and MonotonicSplines.rqs_inverse for forward/inverse

and log-determinant computations.

source

Planar and Radial Flows

NormalizingFlows.planarflowFunction
planarflow(q0, nlayers; paramtype = Float64)

Construct a Planar Flow by stacking nlayers Bijectors.PlanarLayer blocks on top of a base distribution q0.

Arguments

  • q0::Distribution{Multivariate,Continuous}: base distribution (e.g., MvNormal(zeros(d), I)).
  • nlayers::Int: number of planar layers to compose.

Keyword Arguments

  • paramtype::Type{T} = Float64: parameter element type (use Float32 for GPU friendliness).

Returns

  • Bijectors.TransformedDistribution representing the planar flow.

Example

  • q0 = MvNormal(zeros(2), I); flow = planarflow(q0, 10)
  • x = rand(flow, 128); lp = logpdf(flow, x)
source
NormalizingFlows.radialflowFunction
radialflow(q0, nlayers; paramtype = Float64)

Construct a Radial Flow by stacking nlayers Bijectors.RadialLayer blocks on top of a base distribution q0.

Arguments

  • q0::Distribution{Multivariate,Continuous}: base distribution (e.g., MvNormal(zeros(d), I)).
  • nlayers::Int: number of radial layers to compose.

Keyword Arguments

  • paramtype::Type{T} = Float64: parameter element type (use Float32 for GPU friendliness).

Returns

  • Bijectors.TransformedDistribution representing the radial flow.

Example

  • q0 = MvNormal(zeros(2), I); flow = radialflow(q0, 6)
  • x = rand(flow); lp = logpdf(flow, x)
source

Utility Functions

NormalizingFlows.create_flowFunction
create_flow(layers, q0)

Construct a normalizing flow by composing the provided bijector layers and attaching them to the base distribution q0.

  • layers: an iterable of Bijectors.Bijector objects that are composed in order (left-to-right) via function composition

(for instance, if layers = [l1, l2, l3], the flow will be l1∘l2∘l3(q0)).

  • q0: the base distribution (e.g., MvNormal(zeros(d), I)).

Returns a Bijectors.TransformedDistribution representing the resulting flow.

Example

using Distributions, Bijectors, LinearAlgebra
q0 = MvNormal(zeros(2), I)
flow = create_flow((Bijectors.Shift([0.0, 1.0]), Bijectors.Scale([1.0, 2.0])), q0)
source
NormalizingFlows.fnnFunction
fnn(
    input_dim::Int,
    hidden_dims::AbstractVector{Int},
    output_dim::Int;
    inlayer_activation=Flux.leakyrelu,
    output_activation=nothing,
    paramtype::Type{T} = Float64,
)

Create a fully connected neural network (FNN).

Arguments

  • input_dim::Int: The dimension of the input layer.
  • hidden_dims::AbstractVector{<:Int}: A vector of integers specifying the dimensions of the hidden layers.
  • output_dim::Int: The dimension of the output layer.
  • inlayer_activation: The activation function for the hidden layers. Defaults to Flux.leakyrelu.
  • output_activation: The activation function for the output layer. Defaults to nothing.
  • paramtype::Type{T} = Float64: The type of the parameters in the network, defaults to Float64.

Returns

  • A Flux.Chain representing the FNN.
source
  • ASD2020Agrawal, A., Sheldon, D., Domke, J. (2020). Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization. NeurIPS.
  • LJS2017Dinh, L., Sohl-Dickstein, J. and Bengio, S. (2017). Density estimation using Real NVP. ICLR.
  • DBMP2019Durkan, C., Bekasov, A., Murray, I. and Papamarkou, T. (2019). Neural Spline Flows. NeurIPS.