using Turing
@model function normal_model(y)
x ~ Normal()
y ~ Normal(x)
return nothing
endnormal_model (generic function with 2 methods)
After defining a statistical model, in addition to sampling from its distributions, one may be interested in finding the parameter values that maximise (for instance) the posterior density, or the likelihood. This is called mode estimation.
Turing provides support for two mode estimation techniques, maximum likelihood estimation (MLE) and maximum a posteriori (MAP) estimation.
We begin by defining a simple model to work with:
normal_model (generic function with 2 methods)
Once the model is defined, we can construct a model instance as we normally would:
DynamicPPL.Model{typeof(normal_model), (:y,), (), (), Tuple{Float64}, Tuple{}, DynamicPPL.DefaultContext, false}(normal_model, (y = 2.0,), NamedTuple(), DynamicPPL.DefaultContext())
In its simplest form, finding the maximum a posteriori or maximum likelihood parameters is just a function call:
ModeResult
├ estimator : Turing.Optimisation.MLE
├ lp : -0.9189385332046728
├ params : VarNamedTuple with 1 entry
│ └ x => 2.0
│ linked : true
└ (2 more fields: optim_result, ldf)
ModeResult
├ estimator : Turing.Optimisation.MAP
├ lp : -2.8378770664093453
├ params : VarNamedTuple with 1 entry
│ └ x => 0.9999999999999999
│ linked : true
└ (2 more fields: optim_result, ldf)
The estimates are returned as instances of the ModeResult type. It has the fields params (a VarNamedTuple mapping VarNames to the parameter values found) and lp for the log probability at the optimum. For more information, please see the docstring of ModeResult.
You can access individual parameter values by indexing into the params field with VarNames:
If you need a vectorised form of the parameters, you can use vector_names_and_params, which return a tuple of two vectors: one of VarNames and one of the corresponding parameter values. (Note that these values are always returned in untransformed space.)
The optim_result field (which is not printed by default) contains the original result from the underlying optimisation solver, which is useful for diagnosing convergence issues and accessing solver-specific information:
Under the hood, maximum_likelihood and maximum_a_posteriori use the Optimization.jl package, which provides a unified interface to many other optimisation packages. By default Turing uses the LBFGS method from Optim.jl to find the mode estimate, but we can change that to any other solver by passing it as the second argument:
ModeResult
├ estimator : Turing.Optimisation.MLE
├ lp : -0.9189385333682871
├ params : VarNamedTuple with 1 entry
│ └ x => 1.9999819105337715
│ linked : true
└ (2 more fields: optim_result, ldf)
Optimization.jl supports many more solvers; please see its documentation for details.
We can help the optimisation by giving it a starting point we know is close to the final solution. Initial parameters are specified using InitFromParams, and must be provided in model space (i.e. untransformed):
ModeResult
├ estimator : Turing.Optimisation.MLE
├ lp : -0.9189385332046728
├ params : VarNamedTuple with 1 entry
│ └ x => 2.0
│ linked : true
└ (2 more fields: optim_result, ldf)
The default initialisation strategy is InitFromPrior(), which draws initial values from the prior.
You can also specify an automatic differentiation method using the adtype keyword argument:
By default, Turing transforms model parameters to an unconstrained space before optimising (link=true). There are two reasons why one might want to do this:
-Inf outside the support of a distribution.Dirichlet distribution, the parameters must sum to 1. That means that if we do not perform linking, these parameters cannot be varied completely independently, which can lead to numerical issues. In contrast, when linking is performed, the parameters are transformed into a (shorter) vector of parameters that are completely unconstrained and independent.Note that the parameter values returned are always in the original (untransformed) space, regardless of the link setting.
Note that the transformation to unconstrained space refers to the support of the original distribution prior to any optimisation constraints being applied. For example, a parameter x ~ Beta(2, 2) will be transformed from the original space of (0, 1) to the unconstrained space of (-Inf, Inf) (via the logit transform). However, it is possible that the optimisation still proceeds in a constrained space, if constraints on the parameter are specified via lb or ub. For example, if we specify lb=0.0 and ub=0.2 for the same parameter, then the optimisation will proceed in the constrained space of (-Inf, logit(0.2)).
If you want to optimise in the original parameter space instead, set link=false.
ModeResult
├ estimator : Turing.Optimisation.MAP
├ lp : -2.8378770664093453
├ params : VarNamedTuple with 1 entry
│ └ x => 1.0
│ linked : false
└ (2 more fields: optim_result, ldf)
This is usually only useful under very specific circumstances, namely when your model contains distributions for which the mapping from model space to unconstrained space is dependent on another parameter’s value.
You can provide lower and upper bounds on parameters using the lb and ub keywords respectively. Bounds are specified as a VarNamedTuple and, just like initial values, must be provided in model space (i.e. untransformed):
ModeResult
├ estimator : Turing.Optimisation.MLE
├ lp : -2.53893853329569
├ params : VarNamedTuple with 1 entry
│ └ x => 0.19999999994943482
│ linked : true
└ (2 more fields: optim_result, ldf)
Turing will internally translate these bounds to unconstrained space if link=true; as a user you should not need to worry at all about the details of this transformation.
In this case we only have one parameter, but if there are multiple parameters and you only want to constrain some of them, you can provide bounds for the parameters you want to constrain and omit the others.
Note that for some distributions (e.g. Dirichlet, LKJCholesky), the mapping from model-space bounds to linked-space bounds is not well-defined. In these cases, Turing will raise an error. If you need constrained optimisation for such variables, either set link=false or use LogDensityFunction with Optimization.jl directly.
Generic (non-box) constraints are not supported by Turing’s optimisation interface. For these, please use LogDensityFunction and Optimization.jl directly.
Any extra keyword arguments are passed through to Optimization.solve. Some commonly useful ones are maxiters, abstol, and reltol:
ModeResult
├ estimator : Turing.Optimisation.MLE
├ lp : -0.9189385432734339
├ params : VarNamedTuple with 1 entry
│ └ x => 1.9998580932617178
│ linked : true
└ (2 more fields: optim_result, ldf)
To get reproducible results, pass an rng as the first argument:
ModeResult
├ estimator : Turing.Optimisation.MAP
├ lp : -2.8378770664093453
├ params : VarNamedTuple with 1 entry
│ └ x => 1.0
│ linked : true
└ (2 more fields: optim_result, ldf)
This controls the random number generator used for parameter initialisation; the actual optimisation process is deterministic.
For more details and a full list of keyword arguments, see the docstring of Turing.Optimisation.estimate_mode.
Turing extends several methods from StatsBase that can be used to analyse your mode estimation results. Methods implemented include vcov, informationmatrix, coeftable, coef, and coefnames.
For example, let’s examine our MLE estimate from above using coeftable:
| Coef. | Std. Error | z | Pr(> | z | ) | |
|---|---|---|---|---|---|---|
| x | 2.0 | 1.0 | 2.0 | 0.0455003 | 0.040036 | 3.95996 |
Standard errors are calculated from the Fisher information matrix (inverse Hessian of the log likelihood or log joint). Note that standard errors calculated in this way may not always be appropriate for MAP estimates, so please be cautious in interpreting them.
The Hessian is computed using automatic differentiation. By default, ForwardDiff is used, but if you are feeling brave you can specify a different backend via the adtype keyword argument to informationmatrix. (Note that AD backend support for second-order derivatives is more limited than for first-order derivatives, so not all backends will work here.)
You can begin sampling your chain from an MLE/MAP estimate by wrapping it in InitFromParams and providing it to the sample function with the keyword initial_params. For example, here is how to sample from the full posterior using the MAP estimate as the starting point:
Sampling 0%| | ETA: N/A ┌ Info: Found initial step size └ ϵ = 3.6 Sampling 1%|▎ | ETA: 0:11:34 Sampling 1%|▍ | ETA: 0:06:10 Sampling 2%|▋ | ETA: 0:04:00 Sampling 2%|▉ | ETA: 0:03:03 Sampling 3%|█▏ | ETA: 0:02:24 Sampling 3%|█▎ | ETA: 0:02:01 Sampling 4%|█▌ | ETA: 0:01:42 Sampling 4%|█▋ | ETA: 0:01:30 Sampling 5%|█▉ | ETA: 0:01:19 Sampling 5%|██▏ | ETA: 0:01:11 Sampling 6%|██▍ | ETA: 0:01:04 Sampling 6%|██▌ | ETA: 0:00:59 Sampling 7%|██▊ | ETA: 0:00:54 Sampling 7%|███ | ETA: 0:00:50 Sampling 8%|███▏ | ETA: 0:00:46 Sampling 8%|███▍ | ETA: 0:00:43 Sampling 9%|███▋ | ETA: 0:00:40 Sampling 9%|███▊ | ETA: 0:00:38 Sampling 10%|████ | ETA: 0:00:36 Sampling 10%|████▎ | ETA: 0:00:34 Sampling 11%|████▍ | ETA: 0:00:32 Sampling 11%|████▋ | ETA: 0:00:30 Sampling 12%|████▉ | ETA: 0:00:29 Sampling 12%|█████ | ETA: 0:00:27 Sampling 13%|█████▎ | ETA: 0:00:26 Sampling 13%|█████▌ | ETA: 0:00:25 Sampling 14%|█████▋ | ETA: 0:00:24 Sampling 14%|█████▉ | ETA: 0:00:23 Sampling 15%|██████▏ | ETA: 0:00:22 Sampling 15%|██████▎ | ETA: 0:00:21 Sampling 16%|██████▌ | ETA: 0:00:20 Sampling 16%|██████▊ | ETA: 0:00:20 Sampling 17%|███████ | ETA: 0:00:19 Sampling 17%|███████▏ | ETA: 0:00:18 Sampling 18%|███████▍ | ETA: 0:00:18 Sampling 18%|███████▌ | ETA: 0:00:17 Sampling 19%|███████▊ | ETA: 0:00:16 Sampling 19%|████████ | ETA: 0:00:16 Sampling 20%|████████▎ | ETA: 0:00:15 Sampling 20%|████████▍ | ETA: 0:00:15 Sampling 21%|████████▋ | ETA: 0:00:14 Sampling 21%|████████▉ | ETA: 0:00:14 Sampling 22%|█████████ | ETA: 0:00:14 Sampling 22%|█████████▎ | ETA: 0:00:13 Sampling 23%|█████████▌ | ETA: 0:00:13 Sampling 23%|█████████▋ | ETA: 0:00:13 Sampling 24%|█████████▉ | ETA: 0:00:12 Sampling 24%|██████████▏ | ETA: 0:00:12 Sampling 25%|██████████▎ | ETA: 0:00:12 Sampling 25%|██████████▌ | ETA: 0:00:11 Sampling 26%|██████████▊ | ETA: 0:00:11 Sampling 26%|██████████▉ | ETA: 0:00:11 Sampling 27%|███████████▏ | ETA: 0:00:10 Sampling 27%|███████████▍ | ETA: 0:00:10 Sampling 28%|███████████▋ | ETA: 0:00:10 Sampling 28%|███████████▊ | ETA: 0:00:10 Sampling 29%|████████████ | ETA: 0:00:09 Sampling 29%|████████████▏ | ETA: 0:00:09 Sampling 30%|████████████▍ | ETA: 0:00:09 Sampling 30%|████████████▋ | ETA: 0:00:09 Sampling 31%|████████████▉ | ETA: 0:00:09 Sampling 31%|█████████████ | ETA: 0:00:08 Sampling 32%|█████████████▎ | ETA: 0:00:08 Sampling 32%|█████████████▌ | ETA: 0:00:08 Sampling 33%|█████████████▋ | ETA: 0:00:08 Sampling 33%|█████████████▉ | ETA: 0:00:08 Sampling 34%|██████████████▏ | ETA: 0:00:08 Sampling 34%|██████████████▎ | ETA: 0:00:07 Sampling 35%|██████████████▌ | ETA: 0:00:07 Sampling 35%|██████████████▊ | ETA: 0:00:07 Sampling 36%|██████████████▉ | ETA: 0:00:07 Sampling 36%|███████████████▏ | ETA: 0:00:07 Sampling 37%|███████████████▍ | ETA: 0:00:07 Sampling 37%|███████████████▌ | ETA: 0:00:07 Sampling 38%|███████████████▊ | ETA: 0:00:06 Sampling 38%|████████████████ | ETA: 0:00:06 Sampling 39%|████████████████▏ | ETA: 0:00:06 Sampling 39%|████████████████▍ | ETA: 0:00:06 Sampling 40%|████████████████▋ | ETA: 0:00:06 Sampling 40%|████████████████▊ | ETA: 0:00:06 Sampling 41%|█████████████████ | ETA: 0:00:06 Sampling 41%|█████████████████▎ | ETA: 0:00:06 Sampling 42%|█████████████████▌ | ETA: 0:00:05 Sampling 42%|█████████████████▋ | ETA: 0:00:05 Sampling 43%|█████████████████▉ | ETA: 0:00:05 Sampling 43%|██████████████████ | ETA: 0:00:05 Sampling 44%|██████████████████▎ | ETA: 0:00:05 Sampling 44%|██████████████████▌ | ETA: 0:00:05 Sampling 45%|██████████████████▊ | ETA: 0:00:05 Sampling 45%|██████████████████▉ | ETA: 0:00:05 Sampling 46%|███████████████████▏ | ETA: 0:00:05 Sampling 46%|███████████████████▍ | ETA: 0:00:05 Sampling 47%|███████████████████▌ | ETA: 0:00:04 Sampling 47%|███████████████████▊ | ETA: 0:00:04 Sampling 48%|████████████████████ | ETA: 0:00:04 Sampling 48%|████████████████████▏ | ETA: 0:00:04 Sampling 49%|████████████████████▍ | ETA: 0:00:04 Sampling 49%|████████████████████▋ | ETA: 0:00:04 Sampling 50%|████████████████████▊ | ETA: 0:00:04 Sampling 50%|█████████████████████ | ETA: 0:00:04 Sampling 51%|█████████████████████▎ | ETA: 0:00:04 Sampling 51%|█████████████████████▍ | ETA: 0:00:04 Sampling 52%|█████████████████████▋ | ETA: 0:00:04 Sampling 52%|█████████████████████▉ | ETA: 0:00:04 Sampling 53%|██████████████████████▏ | ETA: 0:00:03 Sampling 53%|██████████████████████▎ | ETA: 0:00:03 Sampling 54%|██████████████████████▌ | ETA: 0:00:03 Sampling 54%|██████████████████████▋ | ETA: 0:00:03 Sampling 55%|██████████████████████▉ | ETA: 0:00:03 Sampling 55%|███████████████████████▏ | ETA: 0:00:03 Sampling 56%|███████████████████████▍ | ETA: 0:00:03 Sampling 56%|███████████████████████▌ | ETA: 0:00:03 Sampling 57%|███████████████████████▊ | ETA: 0:00:03 Sampling 57%|████████████████████████ | ETA: 0:00:03 Sampling 58%|████████████████████████▏ | ETA: 0:00:03 Sampling 58%|████████████████████████▍ | ETA: 0:00:03 Sampling 59%|████████████████████████▋ | ETA: 0:00:03 Sampling 59%|████████████████████████▊ | ETA: 0:00:03 Sampling 60%|█████████████████████████ | ETA: 0:00:03 Sampling 60%|█████████████████████████▎ | ETA: 0:00:03 Sampling 61%|█████████████████████████▍ | ETA: 0:00:03 Sampling 61%|█████████████████████████▋ | ETA: 0:00:02 Sampling 62%|█████████████████████████▉ | ETA: 0:00:02 Sampling 62%|██████████████████████████ | ETA: 0:00:02 Sampling 63%|██████████████████████████▎ | ETA: 0:00:02 Sampling 63%|██████████████████████████▌ | ETA: 0:00:02 Sampling 64%|██████████████████████████▋ | ETA: 0:00:02 Sampling 64%|██████████████████████████▉ | ETA: 0:00:02 Sampling 65%|███████████████████████████▏ | ETA: 0:00:02 Sampling 65%|███████████████████████████▎ | ETA: 0:00:02 Sampling 66%|███████████████████████████▌ | ETA: 0:00:02 Sampling 66%|███████████████████████████▊ | ETA: 0:00:02 Sampling 67%|████████████████████████████ | ETA: 0:00:02 Sampling 67%|████████████████████████████▏ | ETA: 0:00:02 Sampling 68%|████████████████████████████▍ | ETA: 0:00:02 Sampling 68%|████████████████████████████▌ | ETA: 0:00:02 Sampling 69%|████████████████████████████▊ | ETA: 0:00:02 Sampling 69%|█████████████████████████████ | ETA: 0:00:02 Sampling 70%|█████████████████████████████▎ | ETA: 0:00:02 Sampling 70%|█████████████████████████████▍ | ETA: 0:00:02 Sampling 71%|█████████████████████████████▋ | ETA: 0:00:02 Sampling 71%|█████████████████████████████▉ | ETA: 0:00:02 Sampling 72%|██████████████████████████████ | ETA: 0:00:02 Sampling 72%|██████████████████████████████▎ | ETA: 0:00:02 Sampling 73%|██████████████████████████████▌ | ETA: 0:00:01 Sampling 73%|██████████████████████████████▋ | ETA: 0:00:01 Sampling 74%|██████████████████████████████▉ | ETA: 0:00:01 Sampling 74%|███████████████████████████████▏ | ETA: 0:00:01 Sampling 75%|███████████████████████████████▎ | ETA: 0:00:01 Sampling 75%|███████████████████████████████▌ | ETA: 0:00:01 Sampling 76%|███████████████████████████████▊ | ETA: 0:00:01 Sampling 76%|███████████████████████████████▉ | ETA: 0:00:01 Sampling 77%|████████████████████████████████▏ | ETA: 0:00:01 Sampling 77%|████████████████████████████████▍ | ETA: 0:00:01 Sampling 78%|████████████████████████████████▋ | ETA: 0:00:01 Sampling 78%|████████████████████████████████▊ | ETA: 0:00:01 Sampling 79%|█████████████████████████████████ | ETA: 0:00:01 Sampling 79%|█████████████████████████████████▏ | ETA: 0:00:01 Sampling 80%|█████████████████████████████████▍ | ETA: 0:00:01 Sampling 80%|█████████████████████████████████▋ | ETA: 0:00:01 Sampling 81%|█████████████████████████████████▉ | ETA: 0:00:01 Sampling 81%|██████████████████████████████████ | ETA: 0:00:01 Sampling 82%|██████████████████████████████████▎ | ETA: 0:00:01 Sampling 82%|██████████████████████████████████▌ | ETA: 0:00:01 Sampling 83%|██████████████████████████████████▋ | ETA: 0:00:01 Sampling 83%|██████████████████████████████████▉ | ETA: 0:00:01 Sampling 84%|███████████████████████████████████▏ | ETA: 0:00:01 Sampling 84%|███████████████████████████████████▎ | ETA: 0:00:01 Sampling 85%|███████████████████████████████████▌ | ETA: 0:00:01 Sampling 85%|███████████████████████████████████▊ | ETA: 0:00:01 Sampling 86%|███████████████████████████████████▉ | ETA: 0:00:01 Sampling 86%|████████████████████████████████████▏ | ETA: 0:00:01 Sampling 87%|████████████████████████████████████▍ | ETA: 0:00:01 Sampling 87%|████████████████████████████████████▌ | ETA: 0:00:01 Sampling 88%|████████████████████████████████████▊ | ETA: 0:00:01 Sampling 88%|█████████████████████████████████████ | ETA: 0:00:01 Sampling 89%|█████████████████████████████████████▏ | ETA: 0:00:01 Sampling 89%|█████████████████████████████████████▍ | ETA: 0:00:00 Sampling 90%|█████████████████████████████████████▋ | ETA: 0:00:00 Sampling 90%|█████████████████████████████████████▊ | ETA: 0:00:00 Sampling 91%|██████████████████████████████████████ | ETA: 0:00:00 Sampling 91%|██████████████████████████████████████▎ | ETA: 0:00:00 Sampling 92%|██████████████████████████████████████▌ | ETA: 0:00:00 Sampling 92%|██████████████████████████████████████▋ | ETA: 0:00:00 Sampling 93%|██████████████████████████████████████▉ | ETA: 0:00:00 Sampling 93%|███████████████████████████████████████ | ETA: 0:00:00 Sampling 94%|███████████████████████████████████████▎ | ETA: 0:00:00 Sampling 94%|███████████████████████████████████████▌ | ETA: 0:00:00 Sampling 95%|███████████████████████████████████████▊ | ETA: 0:00:00 Sampling 95%|███████████████████████████████████████▉ | ETA: 0:00:00 Sampling 96%|████████████████████████████████████████▏ | ETA: 0:00:00 Sampling 96%|████████████████████████████████████████▍ | ETA: 0:00:00 Sampling 97%|████████████████████████████████████████▌ | ETA: 0:00:00 Sampling 97%|████████████████████████████████████████▊ | ETA: 0:00:00 Sampling 98%|█████████████████████████████████████████ | ETA: 0:00:00 Sampling 98%|█████████████████████████████████████████▏| ETA: 0:00:00 Sampling 99%|█████████████████████████████████████████▍| ETA: 0:00:00 Sampling 99%|█████████████████████████████████████████▋| ETA: 0:00:00 Sampling 100%|█████████████████████████████████████████▊| ETA: 0:00:00 Sampling 100%|██████████████████████████████████████████| Time: 0:00:03 Sampling 100%|██████████████████████████████████████████| Time: 0:00:06
Chains MCMC chain (1000×15×1 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 1
Samples per chain = 1000
Wall duration = 5.03 seconds
Compute duration = 5.03 seconds
parameters = x
internals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint
Use `describe(chains)` for summary statistics and quantiles.