API

NormalizingFlows.elbo
NormalizingFlows.elbo_batch
NormalizingFlows.loglikelihood
NormalizingFlows.optimize
NormalizingFlows.train_flow

Main Function

NormalizingFlows.train_flow — Function

train_flow([rng::AbstractRNG, ]vo, flow, args...; kwargs...)

Train the given normalizing flow flow by calling optimize.

Arguments

rng::AbstractRNG: random number generator
vo: variational objective
flow: normalizing flow to be trained, we recommend to define flow as <:Bijectors.TransformedDistribution
args...: additional arguments for vo

Keyword Arguments

max_iters::Int=1000: maximum number of iterations
optimiser::Optimisers.AbstractRule=Optimisers.ADAM(): optimiser to compute the steps
ADbackend::ADTypes.AbstractADType=ADTypes.AutoZygote(): automatic differentiation backend, currently supports ADTypes.AutoZygote(), ADTypes.ForwardDiff(), ADTypes.ReverseDiff(), ADTypes.AutoMooncake() and ADTypes.AutoEnzyme(; mode=Enzyme.set_runtime_activity(Enzyme.Reverse), function_annotation=Enzyme.Const, ). If user wants to use AutoEnzyme, please make sure to include the set_runtime_activity and function_annotation as shown above.
kwargs...: additional keyword arguments for optimize (See optimize for details)

Returns

flow_trained: trained normalizing flow
opt_stats: statistics of the optimiser during the training process (See optimize for details)
st: optimiser state for potential continuation of training

source

The flow object can be constructed by transformed function in Bijectors.jl package. For example of Gaussian VI, we can construct the flow as follows:

using Distributions, Bijectors
T= Float32
@leaf MvNormal # to prevent params in q₀ from being optimized
q₀ = MvNormal(zeros(T, 2), ones(T, 2))
flow = Bijectors.transformed(q₀, Bijectors.Shift(zeros(T,2)) ∘ Bijectors.Scale(ones(T, 2)))

To train the Gaussian VI targeting at distirbution $p$ via ELBO maiximization, we can run

using NormalizingFlows

sample_per_iter = 10
flow_trained, stats, _ = train_flow(
    elbo,
    flow,
    logp,
    sample_per_iter;
    max_iters=2_000,
    optimiser=Optimisers.ADAM(0.01 * one(T)),
)

We have implemented two variational objectives, namely, ELBO and the log-likelihood objective. Users can also define their own objective functions, and pass it to the train_flow function. train_flow will optimize the flow parameters by maximizing vo. The objective function should take the following general form:

vo(rng, flow, args...)

where rng is the random number generator, flow is the flow object, and args... are the additional arguments that users can pass to the objective function.

Evidence Lower Bound (ELBO)

By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$, i.e.,

\[\begin{aligned} &\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Reverse KL)}\\ & = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ F_1(X)\right)\right] \quad \text{(ELBO)} \end{aligned}\]

Reverse KL minimization is typically used for Bayesian computation, where one only has access to the log-(unnormalized)density of the target distribution $p$ (e.g., a Bayesian posterior distribution), and hope to generate approximate samples from it.

NormalizingFlows.elbo — Function

elbo(flow, logp, xs) 
elbo([rng, ]flow, logp, n_samples)

Compute the ELBO for a batch of samples xs from the reference distribution flow.dist.

Arguments

rng: random number generator
flow: variational distribution to be trained. In particular flow = transformed(q₀, T::Bijectors.Bijector), q₀ is a reference distribution that one can easily sample and compute logpdf
logp: log-pdf of the target distribution (not necessarily normalized)
xs: samples from reference dist q₀
n_samples: number of samples from reference dist q₀

source

NormalizingFlows.elbo_batch — Function

elbo_batch(flow, logp, xs) 
elbo_batch([rng, ]flow, logp, n_samples)

Instead of broadcasting over elbosinglesample, this function directly computes the ELBO in a batched manner, which requires the flow.transform to be able to handle batched transformation directly.

This will be more efficient than elbo for invertible neural networks such as RealNVP, Neural Spline Flow, etc.

Arguments

rng: random number generator
flow: variational distribution to be trained. In particular flow = transformed(q₀, T::Bijectors.Bijector), q₀ is a reference distribution that one can easily sample and compute logpdf
logp: log-pdf of the target distribution (not necessarily normalized)
xs: samples from reference dist q₀
n_samples: number of samples from reference dist q₀

source

Log-likelihood

By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$, i.e.,

\[\begin{aligned} & \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\ & = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)} \end{aligned}\]

Forward KL minimization is typically used for generative modeling, where one is given a set of samples from the target distribution $p$ (e.g., images) and aims to learn the density or a generative process that outputs high quality samples.

NormalizingFlows.loglikelihood — Function

loglikelihood(rng, flow::Bijectors.TransformedDistribution, xs::AbstractVecOrMat)

Compute the log-likelihood for variational distribution flow at a batch of samples xs from the target distribution p.

Arguments

rng: random number generator (empty argument, only needed to ensure the same signature as other variational objectives)
flow: variational distribution to be trained. In particular "flow = transformed(q₀, T::Bijectors.Bijector)", q₀ is a reference distribution that one can easily sample and compute logpdf
xs: samples from the target distribution p.

source

Training Loop

NormalizingFlows.optimize — Function

optimize(
    ad::ADTypes.AbstractADType, 
    loss, 
    θ₀::AbstractVector{T}, 
    re, 
    args...; 
    kwargs...
)

Iteratively updating the parameters θ of the normalizing flow re(θ) by calling grad! and using the given optimiser to compute the steps.

Arguments

ad::ADTypes.AbstractADType: automatic differentiation backend
loss: a general loss function θ -> loss(θ, args...) returning a scalar loss value that will be minimised
θ₀::AbstractVector{T}: initial parameters for the loss function (in the context of normalizing flows, it will be the flattened flow parameters)
re: reconstruction function that maps the flattened parameters to the normalizing flow
args...: additional arguments for loss (will be set as DI.Constant)

Keyword Arguments

max_iters::Int=10000: maximum number of iterations
optimiser::Optimisers.AbstractRule=Optimisers.ADAM(): optimiser to compute the steps
show_progress::Bool=true: whether to show the progress bar. The default information printed in the progress bar is the iteration number, the loss value, and the gradient norm.
callback=nothing: callback function with signature cb(iter, opt_state, re, θ) which returns a dictionary-like object of statistics to be displayed in the progress bar. re and θ are used for reconstructing the normalizing flow in case that user want to further axamine the status of the flow.
hasconverged = (iter, opt_stats, re, θ, st) -> false: function that checks whether the training has converged. The default is to always return false.
prog=ProgressMeter.Progress( max_iters; desc="Training", barlen=31, showspeed=true, enabled=show_progress ): progress bar configuration

Returns

θ: trained parameters of the normalizing flow
opt_stats: statistics of the optimiser
st: optimiser state for potential continuation of training

source

API

Main Function

Variational Objectives

Evidence Lower Bound (ELBO)

Log-likelihood

Training Loop