using Turing
@model function normal_model(y)
x ~ Normal()
y ~ Normal(x)
return nothing
endnormal_model (generic function with 2 methods)
After defining a statistical model, in addition to sampling from its distributions, one may be interested in finding the parameter values that maximise for instance the posterior distribution density function or the likelihood. This is called mode estimation. Turing provides support for two mode estimation techniques, maximum likelihood estimation (MLE) and maximum a posteriori (MAP) estimation.
To demonstrate mode estimation, let us load Turing and declare a model:
normal_model (generic function with 2 methods)
Once the model is defined, we can construct a model instance as we normally would:
DynamicPPL.Model{typeof(normal_model), (:y,), (), (), Tuple{Float64}, Tuple{}, DynamicPPL.DefaultContext, false}(normal_model, (y = 2.0,), NamedTuple(), DynamicPPL.DefaultContext())
Finding the maximum a posteriori or maximum likelihood parameters is as simple as
ModeResult with maximized lp of -2.84
[1.0]
The estimates are returned as instances of the ModeResult type. It has the fields params (which stores a mapping of VarNames to the parameter values found) and lp for the log probability at the optimum. For mode information, please see the docstring of ModeResult.
mle_estimate.params = OrderedCollections.OrderedDict{AbstractPPL.VarName, Any}(x => 2.0)
mle_estimate.lp = -0.9189385332046728
Under the hood maximum_likelihood and maximum_a_posteriori use the Optimisation.jl package, which provides a unified interface to many other optimisation packages. By default Turing uses the LBFGS method from Optim.jl to find the mode estimate, but we can easily change that:
maximum_likelihood(model, NelderMead()) = [2.000053753605844] ┌ Warning: The selected optimization algorithm requires second order derivatives, but `SecondOrder` ADtype was not provided. │ So a `SecondOrder` with ADTypes.AutoForwardDiff() for both inner and outer will be created, this can be suboptimal and not work in some cases so │ an explicit `SecondOrder` ADtype is recommended. └ @ OptimizationBase ~/.julia/packages/OptimizationBase/mYxHK/src/cache.jl:58 maximum_likelihood(model, LD_TNEWTON_PRECOND_RESTART()) = [2.0]
The above are just two examples, Optimisation.jl supports many more.
We can also help the optimisation by giving it a starting point we know is close to the final solution (initial_params), or by specifying an automatic differentiation method (adtype).
ModeResult with maximized lp of -0.92
[2.0]
When providing values to arguments like initial_params the parameters are typically specified in the order in which they appear in the code of the model, so in this case first s² then m. More precisely it’s the order returned by Turing.Inference.getparams(model, DynamicPPL.VarInfo(model)).
In Turing v0.43, the initial_params argument for optimisation will be changed to be a DynamicPPL.AbstractInitStrategy as described in the sampling options page).
Furthermore, constraints must be specified as (ideally) a VarNamedTuple. All constraints specified will be interpreted as being in untransformed space.
We can also do constrained optimisation, by providing either intervals within which the parameters must stay, or constraint functions that they need to respect. For instance, here’s how one can find the MLE with the constraint that x must be between 0.0 and 0.2:
┌ Warning: Terminated early due to NaN in gradient. └ @ Optim ~/.julia/packages/Optim/mv9zc/src/multivariate/optimize/optimize.jl:136
ModeResult with maximized lp of -2.54
[0.2]
The arguments for lower (lb) and upper (ub) bounds follow the arguments of Optimisation.OptimizationProblem, as do other parameters for providing constraints, such as cons. Any extraneous keyword arguments given to maximum_likelihood or maximum_a_posteriori are passed to Optimisation.solve. Some often useful ones are maxiters for controlling the maximum number of iterations and abstol and reltol for the absolute and relative convergence tolerances:
ModeResult with maximized lp of -0.92
[1.9996586459959946]
We can check whether the optimisation converged using the optim_result field of the result:
badly_converged_mle.optim_result = retcode: Failure
u: [1.9996586459959946]
Final objective value: 0.9189385914659508
For more details, such as a full list of possible arguments, we encourage the reader to read the docstring of the function Turing.Optimisation.estimate_mode, which is what maximum_likelihood and maximum_a_posteriori call, and the documentation of Optimisation.jl.
Turing extends several methods from StatsBase that can be used to analyse your mode estimation results. Methods implemented include vcov, informationmatrix, coeftable, params, and coef, among others.
For example, let’s examine our ML estimate from above using coeftable:
| Coef. | Std. Error | z | Pr(> | z | ) | |
|---|---|---|---|---|---|---|
| x | 2.0 | 1.0 | 2.0 | 0.0455003 | 0.040036 | 3.95996 |
Standard errors are calculated from the Fisher information matrix (inverse Hessian of the log likelihood or log joint). Note that standard errors calculated in this way may not always be appropriate for MAP estimates, so please be cautious in interpreting them.
You can begin sampling your chain from an MLE/MAP estimate by wrapping it in InitFromParams and providing it to the sample function with the keyword initial_params. For example, here is how to sample from the full posterior using the MAP estimate as the starting point: