API: Turing.Variational
Turing.Variational.q_fullrank_gaussian — Method
q_fullrank_gaussian(
[rng::Random.AbstractRNG,]
model::DynamicPPL.Model;
location::Union{Nothing,<:AbstractVector} = nothing,
scale::Union{Nothing,<:LowerTriangular} = nothing,
kwargs...
)Find a numerically non-degenerate Gaussian q with a scale with full-rank factors (traditionally referred to as a "full-rank family") for approximating the target model.
If the scale set as nothing, the default value will be a zero-mean Gaussian with a LowerTriangular scale matrix (resulting in a covariance with "full-rank" factors) no larger than 0.6*I (covariance of 0.6^2*I). This guarantees that the samples from the initial variational approximation will fall in the range of (-2, 2) with 99.9% probability, which mimics the behavior of the Turing.InitFromUniform() strategy. Whether the default choice is used or not, the scale may be adjusted via q_initialize_scale so that the log-densities of model are finite over the samples from q.
Arguments
model: The targetDynamicPPL.Model.
Keyword Arguments
location: The location parameter of the initialization. Ifnothing, a vector of zeros is used.scale: The scale parameter of the initialization. Ifnothing, an identity matrix is used.
The remaining keyword arguments are passed to q_locationscale.
Returns
q::Bijectors.TransformedDistribution: AAdvancedVI.LocationScaledistribution matching the support ofmodel.
Turing.Variational.q_initialize_scale — Method
q_initialize_scale(
[rng::Random.AbstractRNG,]
model::DynamicPPL.Model,
location::AbstractVector,
scale::AbstractMatrix,
basedist::Distributions.UnivariateDistribution;
num_samples::Int = 10,
num_max_trials::Int = 10,
reduce_factor::Real = one(eltype(scale)) / 2
)Given an initial location-scale distribution q formed by location, scale, and basedist, shrink scale until the expectation of log-densities of model taken over q are finite. If the log-densities are not finite even after num_max_trials, throw an error.
For reference, a location-scale distribution $q$ formed by location, scale, and basedist is a distribution where its sampling process $z \sim q$ can be represented as
u = rand(basedist, d)
z = scale * u + locationArguments
model: The targetDynamicPPL.Model.location: The location parameter of the initialization.scale: The scale parameter of the initialization.basedist: The base distribution of the location-scale family.
Keyword Arguments
num_samples: Number of samples used to compute the average log-density at each trial.num_max_trials: Number of trials until throwing an error.reduce_factor: Factor for shrinking the scale. Afterntrials, the scale is thenscale*reduce_factor^n.
Returns
scale_adj: The adjusted scale matrix matching the type ofscale.
Turing.Variational.q_locationscale — Method
q_locationscale(
[rng::Random.AbstractRNG,]
model::DynamicPPL.Model;
location::Union{Nothing,<:AbstractVector} = nothing,
scale::Union{Nothing,<:Diagonal,<:LowerTriangular} = nothing,
meanfield::Bool = true,
basedist::Distributions.UnivariateDistribution = Normal()
)Find a numerically non-degenerate variational distribution q for approximating the target model within the location-scale variational family formed by the type of scale and basedist.
The distribution can be manually specified by setting location, scale, and basedist. Otherwise, it chooses a Gaussian with zero-mean and scale 0.6*I (covariance of 0.6^2*I) by default. This guarantees that the samples from the initial variational approximation will fall in the range of (-2, 2) with 99.9% probability, which mimics the behavior of the Turing.InitFromUniform() strategy.
Whether the default choice is used or not, the scale may be adjusted via q_initialize_scale so that the log-densities of model are finite over the samples from q. If meanfield is set as true, the scale of q is restricted to be a diagonal matrix and only the diagonal of scale is used.
For reference, a location-scale distribution $q$ formed by location, scale, and basedist is a distribution where its sampling process $z \sim q$ can be represented as
u = rand(basedist, d)
z = scale * u + locationArguments
model: The targetDynamicPPL.Model.
Keyword Arguments
location: The location parameter of the initialization. Ifnothing, a vector of zeros is used.scale: The scale parameter of the initialization. Ifnothing, an identity matrix is used.meanfield: Whether to use the mean-field approximation. Iftrue,scaleis converted into aDiagonalmatrix. Otherwise, it is converted into aLowerTriangularmatrix.basedist: The base distribution of the location-scale family.
The remaining keywords are passed to q_initialize_scale.
Returns
q::Bijectors.TransformedDistribution: AAdvancedVI.LocationScaledistribution matching the support ofmodel.
Turing.Variational.q_meanfield_gaussian — Method
q_meanfield_gaussian(
[rng::Random.AbstractRNG,]
model::DynamicPPL.Model;
location::Union{Nothing,<:AbstractVector} = nothing,
scale::Union{Nothing,<:Diagonal} = nothing,
kwargs...
)Find a numerically non-degenerate mean-field Gaussian q for approximating the target model.
If the scale set as nothing, the default value will be a zero-mean Gaussian with a Diagonal scale matrix (the "mean-field" approximation) no larger than 0.6*I (covariance of 0.6^2*I). This guarantees that the samples from the initial variational approximation will fall in the range of (-2, 2) with 99.9% probability, which mimics the behavior of the Turing.InitFromUniform() strategy. Whether the default choice is used or not, the scale may be adjusted via q_initialize_scale so that the log-densities of model are finite over the samples from q.
Arguments
model: The targetDynamicPPL.Model.
Keyword Arguments
location: The location parameter of the initialization. Ifnothing, a vector of zeros is used.scale: The scale parameter of the initialization. Ifnothing, an identity matrix is used.
The remaining keyword arguments are passed to q_locationscale.
Returns
q::Bijectors.TransformedDistribution: AAdvancedVI.LocationScaledistribution matching the support ofmodel.
Turing.Variational.vi — Method
vi(
[rng::Random.AbstractRNG,]
model::DynamicPPL.Model,
q,
max_iter::Int;
adtype::ADTypes.AbstractADType=DEFAULT_ADTYPE,
algorithm::AdvancedVI.AbstractVariationalAlgorithm = KLMinRepGradProxDescent(
adtype; n_samples=10
),
show_progress::Bool = Turing.PROGRESS[],
kwargs...
)Approximate the target model via the variational inference algorithm algorithm by starting from the initial variational approximation q. This is a thin wrapper around AdvancedVI.optimize.
If the chosen variational inference algorithm operates in an unconstrained space, then the provided initial variational approximation q must be a Bijectors.TransformedDistribution of an unconstrained distribution. For example, the initialization supplied by q_meanfield_gaussian,q_fullrank_gaussian, q_locationscale.
The default algorithm, KLMinRepGradProxDescent (relevant docs), assumes q uses AdvancedVI.MvLocationScale, which can be constructed by invoking q_fullrank_gaussian or q_meanfield_gaussian. For other variational families, refer to the documentation of AdvancedVI to determine the best algorithm and other options.
Arguments
model: The targetDynamicPPL.Model.q: The initial variational approximation.max_iter: Maximum number of steps.- Any additional arguments are passed on to
AdvancedVI.optimize.
Keyword Arguments
adtype: Automatic differentiation backend to be applied to the log-density. The default value foralgorithmalso uses this backend for differentiating the variational objective.algorithm: Variational inference algorithm. The default isKLMinRepGradProxDescent, please refer to AdvancedVI docs for all the options.show_progress: Whether to show the progress bar.unconstrained: Whether to transform the posterior to be unconstrained for running the variational inference algorithm. Iftrue, then the outputqwill be wrapped into aBijectors.TransformedDistributionwith the transformation matching the support of the posterior. The default value depends on the chosenalgorithm.- Any additional keyword arguments are passed on to
AdvancedVI.optimize.
See the docs of AdvancedVI.optimize for additional keyword arguments.
Returns
q: Output variational distribution ofalgorithm.state: Collection of states used byalgorithm. This can be used to resume from a past call tovi.info: Information generated while executingalgorithm.