KLMinScoreGradDescent
This is a convenience constructor for ParamSpaceSGD
with the ScoreGradELBO
objective. This is similar to the algorithm that was originally referred to as black-box variational inference (BBVI; [RGB2014][WW2013]). (The term BBVI has also recently been used to refer to the more general setup of maximizing the ELBO in parameter space. We are using the more narrow definition, which restricts to the use of the score gradient.) However, instead of using the vanilla score gradient estimator, we differentiate the "VarGrad" objective[RBNRA2020], which results in the score gradient variance-reduced by the leave-one-out control variate[SK2014][KvHW2019].
AdvancedVI.KLMinScoreGradDescent
— FunctionKLMinScoreGradDescent(adtype; optimizer, n_samples, averager, operator)
KL divergence minimization by running stochastic gradient descent with the score gradient in the Euclidean space of variational parameters.
Arguments
adtype
: Automatic differentiation backend.
Keyword Arguments
optimizer::Optimisers.AbstractRule
: Optimization algorithm to be used. OnlyDoG
,DoWG
andOptimisers.Descent
are supported. (default:DoWG()
)n_samples::Int
: Number of Monte Carlo samples to be used for estimating each gradient.averager::AbstractAverager
: Parameter averaging strategy. (default:PolynomialAveraging()
)operator::Union{<:IdentityOperator, <:ClipScale}
: Operator to be applied after each gradient descent step. (default:IdentityOperator()
)
Requirements
- The trainable parameters in the variational approximation are expected to be extractable through
Optimisers.destructure
. This requires the variational approximation to be marked as a functor throughFunctors.@functor
. - The variational approximation $q_{\lambda}$ implements
rand
. - The variational approximation $q_{\lambda}$ implements
logpdf(q, x)
, which should also be differentiable with respect tox
. - The target distribution and the variational approximation have the same support.
- RGB2014Ranganath, R., Gerrish, S., & Blei, D. (2014, April). Black box variational inference. In Artificial Intelligence and Statistics (pp. 814-822). PMLR.
- WW2013Wingate, D., & Weber, T. (2013). Automated variational inference in probabilistic programming. arXiv preprint arXiv:1301.1299.
- RBNRA2020Richter, L., Boustati, A., Nüsken, N., Ruiz, F., & Akyildiz, O. D. (2020). Vargrad: a low-variance gradient estimator for variational inference. Advances in Neural Information Processing Systems, 33, 13481-13492.
- SK2014Salimans, T., & Knowles, D. A. (2014). On using control variates with stochastic approximation for variational bayes and its connection to stochastic linear regression. arXiv preprint arXiv:1401.1022.
- KvHW2019Kool, W., van Hoof, H., & Welling, M. (2019). Buy 4 reinforce samples, get a baseline for free!.