API
- NormalizingFlows.AffineCoupling
- NormalizingFlows.NeuralSplineCoupling
- NormalizingFlows.NSF_layer
- NormalizingFlows.RealNVP_layer
- NormalizingFlows.create_flow
- NormalizingFlows.elbo
- NormalizingFlows.elbo_batch
- NormalizingFlows.fnn
- NormalizingFlows.loglikelihood
- NormalizingFlows.nsf
- NormalizingFlows.optimize
- NormalizingFlows.planarflow
- NormalizingFlows.radialflow
- NormalizingFlows.realnvp
- NormalizingFlows.train_flow
Variational Objectives
We provide ELBO (reverse KL) and expected log-likelihood (forward KL). You can also supply your own objective with the signature vo(rng, flow, args...).
Evidence Lower Bound (ELBO)
By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$:
\[\begin{aligned} &\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Reverse KL)}\\ & = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(T_n \circ \cdots \circ T_1(X)\right)\right] \quad \text{(ELBO)} \end{aligned}\]
Reverse KL minimization is typically used for Bayesian computation when only logp is available.
NormalizingFlows.elbo — Functionelbo(flow, logp, xs)
elbo([rng, ] flow, logp, n_samples)Monte Carlo estimates of the ELBO from a batch of samples xs from the  reference distribution flow.dist.
Arguments
- rng: random number generator
- flow: variational distribution to be trained. In particular- flow = transformed(q₀, T::Bijectors.Bijector), q₀ is a reference distribution that one can easily sample and compute logpdf
- logp: log-pdf of the target distribution (not necessarily normalized)
- xs: samples from reference dist q₀
- n_samples: number of samples from reference dist q₀
NormalizingFlows.elbo_batch — Functionelbo_batch(flow, logp, xs)
elbo_batch([rng, ] flow, logp, n_samples)Batched ELBO estimates that transforms a matrix of samples (each column represents a single sample) in one call. This is more efficient for invertible neural-network flows (RealNVP/NSF) as it leverages the batched operation of the neural networks.
Inputs
- flow::Bijectors.MultivariateTransformed
- logp: function returning log-density of target
- xsor- n_samples: column-wise sample batch or number of samples
Returns
- Scalar estimate of the ELBO
Log-likelihood
By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$:
\[\begin{aligned} & \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\ & = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)} \end{aligned}\]
Forward KL minimization is typically used for generative modeling when samples from p are given.
NormalizingFlows.loglikelihood — Functionloglikelihood(rng, flow::Bijectors.TransformedDistribution, xs::AbstractVecOrMat)Compute the log-likelihood for variational distribution flow at a batch of samples xs from the target distribution p.
Arguments
- rng: random number generator (empty argument, only needed to ensure the same signature as other variational objectives)
- flow: variational distribution to be trained. In particular "flow = transformed(q₀, T::Bijectors.Bijector)", q₀ is a reference distribution that one can easily sample and compute logpdf
- xs: samples from the target distribution p.
Training Loop
NormalizingFlows.optimize — Functionoptimize(
    ad::ADTypes.AbstractADType, 
    loss, 
    θ₀::AbstractVector{T}, 
    re, 
    args...; 
    kwargs...
)Iteratively updating the parameters θ of the normalizing flow re(θ) by calling grad!  and using the given optimiser to compute the steps.
Arguments
- ad::ADTypes.AbstractADType: automatic differentiation backend
- loss: a general loss function θ -> loss(θ, args...) returning a scalar loss value that will be minimised
- θ₀::AbstractVector{T}: initial parameters for the loss function (in the context of normalizing flows, it will be the flattened flow parameters)
- re: reconstruction function that maps the flattened parameters to the normalizing flow
- args...: additional arguments for- loss(will be set as DI.Constant)
Keyword Arguments
- max_iters::Int=10000: maximum number of iterations
- optimiser::Optimisers.AbstractRule=Optimisers.ADAM(): optimiser to compute the steps
- show_progress::Bool=true: whether to show the progress bar. The default information printed in the progress bar is the iteration number, the loss value, and the gradient norm.
- callback=nothing: callback function with signature- cb(iter, opt_state, re, θ)which returns a dictionary-like object of statistics to be displayed in the progress bar. re and θ are used for reconstructing the normalizing flow in case that user want to further axamine the status of the flow.
- hasconverged = (iter, opt_stats, re, θ, st) -> false: function that checks whether the training has converged. The default is to always return false.
- prog=ProgressMeter.Progress( max_iters; desc="Training", barlen=31, showspeed=true, enabled=show_progress ): progress bar configuration
Returns
- θ: trained parameters of the normalizing flow
- opt_stats: statistics of the optimiser
- st: optimiser state for potential continuation of training
Available Flows
NormalizingFlows.jl provides two commonly used normalizing flows–-RealNVP and Neural Spline Flow (NSF)–-and two simple flows–-Planar Flow and Radial Flow.
RealNVP (Affine Coupling Flow)
These helpers construct commonly used coupling-based flows with sensible defaults.
NormalizingFlows.realnvp — Functionrealnvp(q0, hdims, nlayers; paramtype = Float64)
realnvp(q0; paramtype = Float64)Construct a RealNVP flow by stacking nlayers RealNVP_layer blocks with odd–even masking. The 1-argument variant defaults to 10 layers with hidden sizes [32, 32] per conditioner.
Arguments
- q0::Distribution{Multivariate,Continuous}: base distribution (e.g.- MvNormal(zeros(d), I)).
- hdims::AbstractVector{Int}: hidden sizes for the conditioner networks.
- nlayers::Int: number of stacked RealNVP layers.
Keyword Arguments
- paramtype::Type{T} = Float64: parameter element type (use- Float32for GPU friendliness).
Returns
- Bijectors.TransformedDistributionrepresenting the RealNVP flow.
Example
- q0 = MvNormal(zeros(2), I); flow = realnvp(q0, [64,64], 8)
- x = rand(flow, 128); lp = logpdf(flow, x)
realnvp(q0; paramtype = Float64)Default constructor: 10 layers, each conditioner uses hidden sizes [32, 32]. Follows a common RealNVP architecture similar to Appendix E of [ASD2020].
NormalizingFlows.RealNVP_layer — FunctionRealNVP_layer(dims, hdims; paramtype = Float64)Construct a single RealNVP layer by composing two AffineCoupling bijectors with complementary odd–even masks.
Arguments
- dims::Int: dimensionality of the problem.
- hdims::AbstractVector{Int}: hidden sizes of the conditioner networks.
Keyword Arguments
- paramtype::Type{T} = Float64: parameter element type.
Returns
- A Bijectors.Bijectorrepresenting the RealNVP layer.
Example
- layer = RealNVP_layer(4, [64, 64])
- y = layer(randn(4, 16))# batched forward
NormalizingFlows.AffineCoupling — TypeAffineCoupling(dim, hdims, mask_idx, paramtype)
AffineCoupling(dim, mask, s, t)Affine coupling bijector used in RealNVP [LJS2017].
Two subnetworks s (log-scale, exponentiated in the forward pass) and t (shift) act on one partition of the input, conditioned on the complementary partition (as defined by mask). For numerical stability, the output of s passes through tanh before exponentiation.
Arguments
- dim::Int: total dimensionality of the input.
- hdims::AbstractVector{Int}: hidden sizes for the conditioner MLPs- sand- t.
- mask_idx::AbstractVector{Int}: indices of the dimensions to transform. The complement is used as the conditioner input.
Keyword Arguments
- paramtype::Type{<:AbstractFloat}: parameter element type (e.g.- Float32).
Fields
- mask::Bijectors.PartitionMask: partition specification.
- s::Flux.Chain: conditioner producing log-scales for the transformed block.
- t::Flux.Chain: conditioner producing shifts for the transformed block.
Notes
- Forward: with (x₁,x₂,x₃) = partition(mask, x), computey₁ = x₁ .* exp.(s(x₂)) .+ t(x₂).
- Log-determinant: sum(s(x₂))(or columnwise for batched matrices).
- All methods support both vectors and column-major batches (matrices).
Neural Spline Flow (NSF)
NormalizingFlows.nsf — Functionnsf(q0, hdims, K, B, nlayers; paramtype = Float64)
nsf(q0; paramtype = Float64)Construct an NSF by stacking nlayers NSF_layer blocks. The one-argument variant defaults to 10 layers with [32, 32] hidden sizes, 10 knots, and boundary 30 (scaled by one(T)).
Arguments
- q0::Distribution{Multivariate,Continuous}: base distribution.
- hdims::AbstractVector{Int}: hidden sizes of the conditioner network.
- K::Int: spline knots per coordinate.
- B::AbstractFloat: spline boundary.
- nlayers::Int: number of NSF layers.
Keyword Arguments
- paramtype::Type{T} = Float64: parameter element type.
Returns
- Bijectors.TransformedDistributionrepresenting the NSF flow.
Under the hood, nsf relies on the rational quadratic spline function implememented in  MonotonicSplines.jl for performance reasons.  MonotonicSplines.jl uses  KernelAbstractions.jl to support batched operations.  Because of this, so far nsf only supports Zygote as the AD type.
Example
- q0 = MvNormal(zeros(3), I); flow = nsf(q0, [64,64], 8, 3.0, 6)
- x = rand(flow, 128); lp = logpdf(flow, x)
NormalizingFlows.NSF_layer — FunctionNSF_layer(dim, hdims, K, B; paramtype = Float64)Build a single Neural Spline Flow (NSF) layer by composing two NeuralSplineCoupling bijectors with complementary odd–even masks.
Arguments
- dim::Int: dimensionality of the problem.
- hdims::AbstractVector{Int}: hidden sizes of the conditioner network.
- K::Int: number of spline knots.
- B::AbstractFloat: spline boundary.
Keyword Arguments
- paramtype::Type{T} = Float64: parameter element type.
Returns
- A Bijectors.Bijectorrepresenting the NSF layer.
Example
- layer = NSF_layer(4, [64,64], 10, 3.0)
- y = layer(randn(4, 32))
NormalizingFlows.NeuralSplineCoupling — TypeNeuralSplineCoupling(dim, hdims, K, B, mask_idx, paramtype)
NeuralSplineCoupling(dim, K, n_dims_transformed, B, nn, mask)Neural Rational Quadratic Spline (RQS) coupling bijector [DBMP2019].
A conditioner network takes the unchanged partition as input and outputs the parameters of monotonic rational quadratic splines for the transformed coordinates. Batched inputs (matrices with column vectors) are supported.
Arguments
- dim::Int: total input dimension.
- hdims::AbstractVector{Int}: hidden sizes for the conditioner MLP.
- K::Int: number of spline knots per transformed coordinate.
- B::AbstractFloat: boundary/box constraint for spline domain.
- mask_idx::AbstractVector{Int}: indices of the transformed coordinates.
Keyword Arguments
- paramtype::Type{<:AbstractFloat}: parameter element type.
Fields
- nn::Flux.Chain: conditioner that outputs all spline params for all transformed dim.
- mask::Bijectors.PartitionMask: partition specification.
Notes
- Output dimensionality of the conditioner is (3K - 1) * n_transformed.
- For computation performance, we rely on
MonotonicSplines.jl for the building the rational quadratic spline functions.
- See MonotonicSplines.rqs_forwardandMonotonicSplines.rqs_inversefor forward/inverse
and log-determinant computations.
Planar and Radial Flows
NormalizingFlows.planarflow — Functionplanarflow(q0, nlayers; paramtype = Float64)Construct a Planar Flow by stacking nlayers Bijectors.PlanarLayer blocks on top of a base distribution q0.
Arguments
- q0::Distribution{Multivariate,Continuous}: base distribution (e.g.,- MvNormal(zeros(d), I)).
- nlayers::Int: number of planar layers to compose.
Keyword Arguments
- paramtype::Type{T} = Float64: parameter element type (use- Float32for GPU friendliness).
Returns
- Bijectors.TransformedDistributionrepresenting the planar flow.
Example
- q0 = MvNormal(zeros(2), I); flow = planarflow(q0, 10)
- x = rand(flow, 128); lp = logpdf(flow, x)
NormalizingFlows.radialflow — Functionradialflow(q0, nlayers; paramtype = Float64)Construct a Radial Flow by stacking nlayers Bijectors.RadialLayer blocks on top of a base distribution q0.
Arguments
- q0::Distribution{Multivariate,Continuous}: base distribution (e.g.,- MvNormal(zeros(d), I)).
- nlayers::Int: number of radial layers to compose.
Keyword Arguments
- paramtype::Type{T} = Float64: parameter element type (use- Float32for GPU friendliness).
Returns
- Bijectors.TransformedDistributionrepresenting the radial flow.
Example
- q0 = MvNormal(zeros(2), I); flow = radialflow(q0, 6)
- x = rand(flow); lp = logpdf(flow, x)
Utility Functions
NormalizingFlows.create_flow — Functioncreate_flow(layers, q0)Construct a normalizing flow by composing the provided bijector layers and attaching them to the base distribution q0.
- layers: an iterable of- Bijectors.Bijectorobjects that are composed in order (left-to-right) via function composition
(for instance, if layers = [l1, l2, l3], the flow will be l1∘l2∘l3(q0)).
- q0: the base distribution (e.g.,- MvNormal(zeros(d), I)).
Returns a Bijectors.TransformedDistribution representing the resulting flow.
Example
using Distributions, Bijectors, LinearAlgebra
q0 = MvNormal(zeros(2), I)
flow = create_flow((Bijectors.Shift([0.0, 1.0]), Bijectors.Scale([1.0, 2.0])), q0)NormalizingFlows.fnn — Functionfnn(
    input_dim::Int,
    hidden_dims::AbstractVector{Int},
    output_dim::Int;
    inlayer_activation=Flux.leakyrelu,
    output_activation=nothing,
    paramtype::Type{T} = Float64,
)Create a fully connected neural network (FNN).
Arguments
- input_dim::Int: The dimension of the input layer.
- hidden_dims::AbstractVector{<:Int}: A vector of integers specifying the dimensions of the hidden layers.
- output_dim::Int: The dimension of the output layer.
- inlayer_activation: The activation function for the hidden layers. Defaults to- Flux.leakyrelu.
- output_activation: The activation function for the output layer. Defaults to- nothing.
- paramtype::Type{T} = Float64: The type of the parameters in the network, defaults to- Float64.
Returns
- A Flux.Chainrepresenting the FNN.
- ASD2020Agrawal, A., Sheldon, D., Domke, J. (2020). Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization. NeurIPS.
- LJS2017Dinh, L., Sohl-Dickstein, J. and Bengio, S. (2017). Density estimation using Real NVP. ICLR.
- DBMP2019Durkan, C., Bekasov, A., Murray, I. and Papamarkou, T. (2019). Neural Spline Flows. NeurIPS.