API: Turing.Variational

Turing.Variational.q_fullrank_gaussianMethod
q_fullrank_gaussian(
    [rng::Random.AbstractRNG,]
    model::DynamicPPL.Model;
    location::Union{Nothing,<:AbstractVector} = nothing,
    scale::Union{Nothing,<:LowerTriangular} = nothing,
    kwargs...
)

Find a numerically non-degenerate Gaussian q with a scale with full-rank factors (traditionally referred to as a "full-rank family") for approximating the target model.

If the scale set as nothing, the default value will be a zero-mean Gaussian with a LowerTriangular scale matrix (resulting in a covariance with "full-rank" factors) no larger than 0.6*I (covariance of 0.6^2*I). This guarantees that the samples from the initial variational approximation will fall in the range of (-2, 2) with 99.9% probability, which mimics the behavior of the Turing.InitFromUniform() strategy. Whether the default choice is used or not, the scale may be adjusted via q_initialize_scale so that the log-densities of model are finite over the samples from q.

Arguments

  • model: The target DynamicPPL.Model.

Keyword Arguments

  • location: The location parameter of the initialization. If nothing, a vector of zeros is used.
  • scale: The scale parameter of the initialization. If nothing, an identity matrix is used.

The remaining keyword arguments are passed to q_locationscale.

Returns

  • q::Bijectors.TransformedDistribution: A AdvancedVI.LocationScale distribution matching the support of model.
source
Turing.Variational.q_initialize_scaleMethod
q_initialize_scale(
    [rng::Random.AbstractRNG,]
    model::DynamicPPL.Model,
    location::AbstractVector,
    scale::AbstractMatrix,
    basedist::Distributions.UnivariateDistribution;
    num_samples::Int = 10,
    num_max_trials::Int = 10,
    reduce_factor::Real = one(eltype(scale)) / 2
)

Given an initial location-scale distribution q formed by location, scale, and basedist, shrink scale until the expectation of log-densities of model taken over q are finite. If the log-densities are not finite even after num_max_trials, throw an error.

For reference, a location-scale distribution $q$ formed by location, scale, and basedist is a distribution where its sampling process $z \sim q$ can be represented as

u = rand(basedist, d)
z = scale * u + location

Arguments

  • model: The target DynamicPPL.Model.
  • location: The location parameter of the initialization.
  • scale: The scale parameter of the initialization.
  • basedist: The base distribution of the location-scale family.

Keyword Arguments

  • num_samples: Number of samples used to compute the average log-density at each trial.
  • num_max_trials: Number of trials until throwing an error.
  • reduce_factor: Factor for shrinking the scale. After n trials, the scale is then scale*reduce_factor^n.

Returns

  • scale_adj: The adjusted scale matrix matching the type of scale.
source
Turing.Variational.q_locationscaleMethod
q_locationscale(
    [rng::Random.AbstractRNG,]
    model::DynamicPPL.Model;
    location::Union{Nothing,<:AbstractVector} = nothing,
    scale::Union{Nothing,<:Diagonal,<:LowerTriangular} = nothing,
    meanfield::Bool = true,
    basedist::Distributions.UnivariateDistribution = Normal()
)

Find a numerically non-degenerate variational distribution q for approximating the target model within the location-scale variational family formed by the type of scale and basedist.

The distribution can be manually specified by setting location, scale, and basedist. Otherwise, it chooses a Gaussian with zero-mean and scale 0.6*I (covariance of 0.6^2*I) by default. This guarantees that the samples from the initial variational approximation will fall in the range of (-2, 2) with 99.9% probability, which mimics the behavior of the Turing.InitFromUniform() strategy.

Whether the default choice is used or not, the scale may be adjusted via q_initialize_scale so that the log-densities of model are finite over the samples from q. If meanfield is set as true, the scale of q is restricted to be a diagonal matrix and only the diagonal of scale is used.

For reference, a location-scale distribution $q$ formed by location, scale, and basedist is a distribution where its sampling process $z \sim q$ can be represented as

u = rand(basedist, d)
z = scale * u + location

Arguments

  • model: The target DynamicPPL.Model.

Keyword Arguments

  • location: The location parameter of the initialization. If nothing, a vector of zeros is used.
  • scale: The scale parameter of the initialization. If nothing, an identity matrix is used.
  • meanfield: Whether to use the mean-field approximation. If true, scale is converted into a Diagonal matrix. Otherwise, it is converted into a LowerTriangular matrix.
  • basedist: The base distribution of the location-scale family.

The remaining keywords are passed to q_initialize_scale.

Returns

  • q::Bijectors.TransformedDistribution: A AdvancedVI.LocationScale distribution matching the support of model.
source
Turing.Variational.q_meanfield_gaussianMethod
q_meanfield_gaussian(
    [rng::Random.AbstractRNG,]
    model::DynamicPPL.Model;
    location::Union{Nothing,<:AbstractVector} = nothing,
    scale::Union{Nothing,<:Diagonal} = nothing,
    kwargs...
)

Find a numerically non-degenerate mean-field Gaussian q for approximating the target model.

If the scale set as nothing, the default value will be a zero-mean Gaussian with a Diagonal scale matrix (the "mean-field" approximation) no larger than 0.6*I (covariance of 0.6^2*I). This guarantees that the samples from the initial variational approximation will fall in the range of (-2, 2) with 99.9% probability, which mimics the behavior of the Turing.InitFromUniform() strategy. Whether the default choice is used or not, the scale may be adjusted via q_initialize_scale so that the log-densities of model are finite over the samples from q.

Arguments

  • model: The target DynamicPPL.Model.

Keyword Arguments

  • location: The location parameter of the initialization. If nothing, a vector of zeros is used.
  • scale: The scale parameter of the initialization. If nothing, an identity matrix is used.

The remaining keyword arguments are passed to q_locationscale.

Returns

  • q::Bijectors.TransformedDistribution: A AdvancedVI.LocationScale distribution matching the support of model.
source
Turing.Variational.viMethod
vi(
    [rng::Random.AbstractRNG,]
    model::DynamicPPL.Model,
    q,
    max_iter::Int;
    adtype::ADTypes.AbstractADType=DEFAULT_ADTYPE,
    algorithm::AdvancedVI.AbstractVariationalAlgorithm = KLMinRepGradProxDescent(
        adtype; n_samples=10
    ),
    show_progress::Bool = Turing.PROGRESS[],
    kwargs...
)

Approximate the target model via the variational inference algorithm algorithm by starting from the initial variational approximation q. This is a thin wrapper around AdvancedVI.optimize.

If the chosen variational inference algorithm operates in an unconstrained space, then the provided initial variational approximation q must be a Bijectors.TransformedDistribution of an unconstrained distribution. For example, the initialization supplied by q_meanfield_gaussian,q_fullrank_gaussian, q_locationscale.

The default algorithm, KLMinRepGradProxDescent (relevant docs), assumes q uses AdvancedVI.MvLocationScale, which can be constructed by invoking q_fullrank_gaussian or q_meanfield_gaussian. For other variational families, refer to the documentation of AdvancedVI to determine the best algorithm and other options.

Arguments

  • model: The target DynamicPPL.Model.
  • q: The initial variational approximation.
  • max_iter: Maximum number of steps.
  • Any additional arguments are passed on to AdvancedVI.optimize.

Keyword Arguments

  • adtype: Automatic differentiation backend to be applied to the log-density. The default value for algorithm also uses this backend for differentiating the variational objective.
  • algorithm: Variational inference algorithm. The default is KLMinRepGradProxDescent, please refer to AdvancedVI docs for all the options.
  • show_progress: Whether to show the progress bar.
  • unconstrained: Whether to transform the posterior to be unconstrained for running the variational inference algorithm. If true, then the output q will be wrapped into a Bijectors.TransformedDistribution with the transformation matching the support of the posterior. The default value depends on the chosen algorithm.
  • Any additional keyword arguments are passed on to AdvancedVI.optimize.

See the docs of AdvancedVI.optimize for additional keyword arguments.

Returns

  • q: Output variational distribution of algorithm.
  • state: Collection of states used by algorithm. This can be used to resume from a past call to vi.
  • info: Information generated while executing algorithm.
source