using Turing
@model function gdemo(x)
~ InverseGamma(2, 3)
s² ~ Normal(0, sqrt(s²))
m
for i in eachindex(x)
~ Normal(m, sqrt(s²))
x[i] end
end
gdemo (generic function with 2 methods)
After defining a statistical model, in addition to sampling from its distributions, one may be interested in finding the parameter values that maximise for instance the posterior distribution density function or the likelihood. This is called mode estimation. Turing provides support for two mode estimation techniques, maximum likelihood estimation (MLE) and maximum a posterior (MAP) estimation.
To demonstrate mode estimation, let us load Turing and declare a model:
using Turing
@model function gdemo(x)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s²))
end
end
gdemo (generic function with 2 methods)
Once the model is defined, we can construct a model instance as we normally would:
DynamicPPL.Model{typeof(gdemo), (:x,), (), (), Tuple{Vector{Float64}}, Tuple{}, DynamicPPL.DefaultContext}(gdemo, (x = [1.5, 2.0],), NamedTuple(), DynamicPPL.DefaultContext())
Finding the maximum aposteriori or maximum likelihood parameters is as simple as
# Generate a MLE estimate.
mle_estimate = maximum_likelihood(model)
# Generate a MAP estimate.
map_estimate = maximum_a_posteriori(model)
ModeResult with maximized lp of -4.62
[0.9074074074415232, 1.1666666645716819]
The estimates are returned as instances of the ModeResult
type. It has the fields values
for the parameter values found and lp
for the log probability at the optimum, as well as f
for the objective function and optim_result
for more detailed results of the optimisation procedure.
mle_estimate.values = [0.06249999999999998, 1.75]
mle_estimate.lp = -0.06528834416956464
Under the hood maximum_likelihood
and maximum_a_posteriori
use the Optimization.jl package, which provides a unified interface to many other optimisation packages. By default Turing typically uses the LBFGS method from Optim.jl to find the mode estimate, but we can easily change that:
using OptimizationOptimJL: NelderMead
@show maximum_likelihood(model, NelderMead())
using OptimizationNLopt: NLopt.LD_TNEWTON_PRECOND_RESTART
@show maximum_likelihood(model, LD_TNEWTON_PRECOND_RESTART());
maximum_likelihood(model, NelderMead()) = [0.06249864122586508, 1.7500049963618138]
┌ Warning: The selected optimization algorithm requires second order derivatives, but `SecondOrder` ADtype was not provided.
│ So a `SecondOrder` with ADTypes.AutoForwardDiff() for both inner and outer will be created, this can be suboptimal and not work in some cases so
│ an explicit `SecondOrder` ADtype is recommended.
└ @ OptimizationBase ~/.julia/packages/OptimizationBase/gvXsf/src/cache.jl:49
maximum_likelihood(model, LD_TNEWTON_PRECOND_RESTART()) = [0.06250000000000061, 1.7499999999999991]
The above are just two examples, Optimization.jl supports many more.
We can also help the optimisation by giving it a starting point we know is close to the final solution, or by specifying an automatic differentiation method
using ADTypes: AutoReverseDiff
import ReverseDiff
maximum_likelihood(
model, NelderMead(); initial_params=[0.1, 2], adtype=AutoReverseDiff()
)
ModeResult with maximized lp of -0.07
[0.062494553692639856, 1.7500042095865365]
When providing values to arguments like initial_params
the parameters are typically specified in the order in which they appear in the code of the model, so in this case first s²
then m
. More precisely it’s the order returned by Turing.Inference.getparams(model, Turing.VarInfo(model))
.
We can also do constrained optimisation, by providing either intervals within which the parameters must stay, or costraint functions that they need to respect. For instance, here’s how one can find the MLE with the constraint that the variance must be less than 0.01 and the mean must be between -1 and 1.:
ModeResult with maximized lp of -59.73
[0.00999999998953321, 0.9999999995708616]
The arguments for lower (lb
) and upper (ub
) bounds follow the arguments of Optimization.OptimizationProblem
, as do other parameters for providing constraints, such as cons
. Any extraneous keyword arguments given to maximum_likelihood
or maximum_a_posteriori
are passed to Optimization.solve
. Some often useful ones are maxiters
for controlling the maximum number of iterations and abstol
and reltol
for the absolute and relative convergence tolerances:
ModeResult with maximized lp of -0.72
[0.25419003135943385, 1.7198649633660663]
We can check whether the optimisation converged using the optim_result
field of the result:
badly_converged_mle.optim_result = retcode: Failure
u: [-1.369673136733609, 1.7198649633660663]
Final objective value: 0.7176555706146655
For more details, such as a full list of possible arguments, we encourage the reader to read the docstring of the function Turing.Optimisation.estimate_mode
, which is what maximum_likelihood
and maximum_a_posteriori
call, and the documentation of Optimization.jl.
Turing extends several methods from StatsBase
that can be used to analyze your mode estimation results. Methods implemented include vcov
, informationmatrix
, coeftable
, params
, and coef
, among others.
For example, let’s examine our ML estimate from above using coeftable
:
Coef. | Std. Error | z | Pr(> | z | ) | |
---|---|---|---|---|---|---|
s² | 0.0625 | 0.0625 | 1.0 | 0.317311 | -0.0599977 | 0.184998 |
m | 1.75 | 0.176777 | 9.89949 | 4.18383e-23 | 1.40352 | 2.09648 |
Standard errors are calculated from the Fisher information matrix (inverse Hessian of the log likelihood or log joint). Note that standard errors calculated in this way may not always be appropriate for MAP estimates, so please be cautious in interpreting them.
You can begin sampling your chain from an MLE/MAP estimate by extracting the vector of parameter values and providing it to the sample
function with the keyword initial_params
. For example, here is how to sample from the full posterior using the MAP estimate as the starting point: