MCMCDiagnosticTools

MCMCDiagnosticTools provides functionality for diagnosing samples generated using Markov Chain Monte Carlo.

Background

Some methods were originally part of Mamba.jl and then MCMCChains.jl. This package is a joint collaboration between the Turing and ArviZ projects.

Effective sample size and $\widehat{R}$

MCMCDiagnosticTools.essFunction
ess(
    samples::AbstractArray{<:Union{Missing,Real}};
    kind=:bulk,
    relative::Bool=false,
    autocov_method=AutocovMethod(),
    split_chains::Int=2,
    maxlag::Int=250,
    kwargs...
)

Estimate the effective sample size (ESS) of the samples of shape (draws, [chains[, parameters...]]) with the autocov_method.

Optionally, the kind of ESS estimate to be computed can be specified (see below). Some kinds accept additional kwargs.

If relative is true, the relative ESS is returned, i.e. ess / (draws * chains).

split_chains indicates the number of chains each chain is split into. When split_chains > 1, then the diagnostics check for within-chain convergence. When d = mod(draws, split_chains) > 0, i.e. the chains cannot be evenly split, then 1 draw is discarded after each of the first d splits within each chain. There must be at least 3 draws in each chain after splitting.

maxlag indicates the maximum lag for which autocovariance is computed and must be greater than 0.

For a given estimand, it is recommended that the ESS is at least 100 * chains and that $\widehat{R} < 1.01$.[VehtariGelman2021]

See also: AutocovMethod, FFTAutocovMethod, BDAAutocovMethod, rhat, ess_rhat, mcse

Kinds of ESS estimates

If kind isa a Symbol, it may take one of the following values:

  • :bulk: basic ESS computed on rank-normalized draws. This kind diagnoses poor convergence in the bulk of the distribution due to trends or different locations of the chains.
  • :tail: minimum of the quantile-ESS for the symmetric quantiles where tail_prob=0.1 is the probability in the tails. This kind diagnoses poor convergence in the tails of the distribution. If this kind is chosen, kwargs may contain a tail_prob keyword.
  • :basic: basic ESS, equivalent to specifying kind=Statistics.mean.
Note

While Bulk-ESS is conceptually related to basic ESS, it is well-defined even if the chains do not have finite variance.[VehtariGelman2021] For each parameter, rank-normalization proceeds by first ranking the inputs using "tied ranking" and then transforming the ranks to normal quantiles so that the result is standard normally distributed. This transform is monotonic.

Otherwise, kind specifies one of the following estimators, whose ESS is to be estimated:

  • Statistics.mean
  • Statistics.median
  • Statistics.std
  • StatsBase.mad
  • Base.Fix2(Statistics.quantile, p::Real)
source
MCMCDiagnosticTools.rhatFunction
rhat(samples::AbstractArray{Union{Real,Missing}}; kind::Symbol=:rank, split_chains=2)

Compute the $\widehat{R}$ diagnostics for each parameter in samples of shape (draws, [chains[, parameters...]]).[VehtariGelman2021]

kind indicates the kind of $\widehat{R}$ to compute (see below).

split_chains indicates the number of chains each chain is split into. When split_chains > 1, then the diagnostics check for within-chain convergence. When d = mod(draws, split_chains) > 0, i.e. the chains cannot be evenly split, then 1 draw is discarded after each of the first d splits within each chain.

See also ess, ess_rhat, rstar

Kinds of $\widehat{R}$

The following kinds are supported:

  • :rank: maximum of $\widehat{R}$ with kind=:bulk and kind=:tail.
  • :bulk: basic $\widehat{R}$ computed on rank-normalized draws. This kind diagnoses poor convergence in the bulk of the distribution due to trends or different locations of the chains.
  • :tail: $\widehat{R}$ computed on draws folded around the median and then rank-normalized. This kind diagnoses poor convergence in the tails of the distribution due to different scales of the chains.
  • :basic: Classic $\widehat{R}$.
source
MCMCDiagnosticTools.ess_rhatFunction
ess_rhat(
    samples::AbstractArray{<:Union{Missing,Real}};
    kind::Symbol=:rank,
    kwargs...,
) -> NamedTuple{(:ess, :rhat)}

Estimate the effective sample size and $\widehat{R}$ of the samples of shape (draws, [chains[, parameters...]]).

When both ESS and $\widehat{R}$ are needed, this method is often more efficient than calling ess and rhat separately.

See rhat for a description of supported kinds and ess for a description of kwargs.

source

The following autocov_methods are supported:

MCMCDiagnosticTools.FFTAutocovMethodType
FFTAutocovMethod <: AbstractAutocovMethod

The FFTAutocovMethod uses a standard algorithm for estimating the mean autocovariance of MCMC chains.

The algorithm is the same as the one of AutocovMethod but this method uses fast Fourier transforms (FFTs) for estimating the autocorrelation.

Info

To be able to use this method, you have to load a package that implements the AbstractFFTs.jl interface such as FFTW.jl or FastTransforms.jl.

source
MCMCDiagnosticTools.BDAAutocovMethodType
BDAAutocovMethod <: AbstractAutocovMethod

The BDAAutocovMethod uses a standard algorithm for estimating the mean autocovariance of MCMC chains.

It is is based on the discussion by [VehtariGelman2021]. and uses the variogram estimator of the autocorrelation function discussed by [BDA3].

source

Monte Carlo standard error

MCMCDiagnosticTools.mcseFunction
mcse(samples::AbstractArray{<:Union{Missing,Real}}; kind=Statistics.mean, kwargs...)

Estimate the Monte Carlo standard errors (MCSE) of the estimator kind applied to samples of shape (draws, [chains[, parameters...]]).

See also: ess

Kinds of MCSE estimates

The estimator whose MCSE should be estimated is specified with kind. kind must accept a vector of the same eltype as samples and return a real estimate.

For the following estimators, the effective sample size ess and an estimate of the asymptotic variance are used to compute the MCSE, and kwargs are forwarded to ess:

  • Statistics.mean
  • Statistics.median
  • Statistics.std
  • Base.Fix2(Statistics.quantile, p::Real)

For other estimators, the subsampling bootstrap method (SBM)[FlegalJones2011][Flegal2012] is used as a fallback, and the only accepted kwargs are batch_size, which indicates the size of the overlapping batches used to estimate the MCSE, defaulting to floor(Int, sqrt(draws * chains)). Note that SBM tends to underestimate the MCSE, especially for highly autocorrelated chains. One should verify that autocorrelation is low by checking the bulk- and tail-ESS values.

source

R⋆ diagnostic

MCMCDiagnosticTools.rstarFunction
rstar(
    rng::Random.AbstractRNG=Random.default_rng(),
    classifier,
    samples,
    chain_indices::AbstractVector{Int};
    subset::Real=0.7,
    split_chains::Int=2,
    verbosity::Int=0,
)

Compute the $R^*$ convergence statistic of the table samples with the classifier.

samples must be either an AbstractMatrix, an AbstractVector, or a table (i.e. implements the Tables.jl interface) whose rows are draws and whose columns are parameters.

chain_indices indicates the chain ids of each row of samples.

This method supports ragged chains, i.e. chains of nonequal lengths.

source
rstar(
    rng::Random.AbstractRNG=Random.default_rng(),
    classifier,
    samples::AbstractArray{<:Real};
    subset::Real=0.7,
    split_chains::Int=2,
    verbosity::Int=0,
)

Compute the $R^*$ convergence statistic of the samples with the classifier.

samples is an array of draws with the shape (draws, [chains[, parameters...]]).`

This implementation is an adaption of algorithms 1 and 2 described by Lambert and Vehtari.

The classifier has to be a supervised classifier of the MLJ framework (see the MLJ documentation for a list of supported models). It is trained with a subset of the samples from each chain. Each chain is split into split_chains separate chains to additionally check for within-chain convergence. The training of the classifier can be inspected by adjusting the verbosity level.

If the classifier is deterministic, i.e., if it predicts a class, the value of the $R^*$ statistic is returned (algorithm 1). If the classifier is probabilistic, i.e., if it outputs probabilities of classes, the scaled Poisson-binomial distribution of the $R^*$ statistic is returned (algorithm 2).

Note

The correctness of the statistic depends on the convergence of the classifier used internally in the statistic.

Examples

julia> using MLJBase, MLJIteration, EvoTrees, Statistics, StatisticalMeasures

julia> samples = fill(4.0, 100, 3, 2);

One can compute the distribution of the $R^*$ statistic (algorithm 2) with a probabilistic classifier. For instance, we can use a gradient-boosted trees model with nrounds = 100 sequentially stacked trees and learning rate eta = 0.05:

julia> model = EvoTreeClassifier(; nrounds=100, eta=0.05);

julia> distribution = rstar(model, samples);

julia> round(mean(distribution); digits=2)
1.0f0

Note, however, that it is recommended to determine nrounds based on early-stopping. With the MLJ framework, this can be achieved in the following way (see the MLJ documentation for additional explanations):

julia> model = IteratedModel(;
           model=EvoTreeClassifier(; eta=0.05),
           iteration_parameter=:nrounds,
           resampling=Holdout(),
           measures=log_loss,
           controls=[Step(5), Patience(2), NumberLimit(100)],
           retrain=true,
       );

julia> distribution = rstar(model, samples);

julia> round(mean(distribution); digits=2)
1.0f0

For deterministic classifiers, a single $R^*$ statistic (algorithm 1) is returned. Deterministic classifiers can also be derived from probabilistic classifiers by e.g. predicting the mode. In MLJ this corresponds to a pipeline of models.

julia> evotree_deterministic = Pipeline(model; operation=predict_mode);

julia> value = rstar(evotree_deterministic, samples);

julia> round(value; digits=2)
1.0

References

Lambert, B., & Vehtari, A. (2020). $R^*$: A robust MCMC convergence diagnostic with uncertainty using decision tree classifiers.

source

Bayesian fraction of missing information

MCMCDiagnosticTools.bfmiFunction
bfmi(energy::AbstractVector{<:Real}) -> Real
bfmi(energy::AbstractMatrix{<:Real}; dims::Int=1) -> AbstractVector{<:Real}

Calculate the estimated Bayesian fraction of missing information (BFMI).

When sampling with Hamiltonian Monte Carlo (HMC), BFMI quantifies how well momentum resampling matches the marginal energy distribution.

The current advice is that values smaller than 0.3 indicate poor sampling. However, this threshold is provisional and may change. A BFMI value below the threshold often indicates poor adaptation of sampling parameters or that the target distribution has heavy tails that were not well explored by the Markov chain.

For more information, see Section 6.1 of [Betancourt2018] or [Betancourt2016] for a complete account.

energy is either a vector of Hamiltonian energies of draws or a matrix of energies of draws for multiple chains. dims indicates the dimension in energy that contains the draws. The default dims=1 assumes energy has the shape draws or (draws, chains). If a different shape is provided, dims must be set accordingly.

If energy is a vector, a single BFMI value is returned. Otherwise, a vector of BFMI values for each chain is returned.

source

Other diagnostics

Note

These diagnostics are older and less widely used.

MCMCDiagnosticTools.discretediagFunction
discretediag(samples::AbstractArray{<:Real,3}; frac=0.3, method=:weiss, nsim=1_000)

Compute discrete diagnostic on samples with shape (draws, chains, parameters).

method can be one of :weiss, :hangartner, :DARBOOT, :MCBOOT, :billinsgley, and :billingsleyBOOT.

References

Benjamin E. Deonovic, & Brian J. Smith. (2017). Convergence diagnostics for MCMC draws of a categorical variable.

source
MCMCDiagnosticTools.gelmandiagFunction
gelmandiag(samples::AbstractArray{<:Real,3}; alpha::Real=0.95)

Compute the Gelman, Rubin and Brooks diagnostics [Gelman1992] [Brooks1998] on samples with shape (draws, chains, parameters). Values of the diagnostic’s potential scale reduction factor (PSRF) that are close to one suggest convergence. As a rule-of-thumb, convergence is rejected if the 97.5 percentile of a PSRF is greater than 1.2.

source
MCMCDiagnosticTools.gelmandiag_multivariateFunction
gelmandiag_multivariate(samples::AbstractArray{<:Real,3}; alpha::Real=0.05)

Compute the multivariate Gelman, Rubin and Brooks diagnostics on samples with shape (draws, chains, parameters).

source
MCMCDiagnosticTools.gewekediagFunction
gewekediag(x::AbstractVector{<:Real}; first::Real=0.1, last::Real=0.5, kwargs...)

Compute the Geweke diagnostic [Geweke1991] from the first and last proportion of samples x.

The diagnostic is designed to asses convergence of posterior means estimated with autocorrelated samples. It computes a normal-based test statistic comparing the sample means in two windows containing proportions of the first and last iterations. Users should ensure that there is sufficient separation between the two windows to assume that their samples are independent. A non-significant test p-value indicates convergence. Significant p-values indicate non-convergence and the possible need to discard initial samples as a burn-in sequence or to simulate additional samples.

kwargs are forwarded to mcse.

source
MCMCDiagnosticTools.heideldiagFunction
heideldiag(
    x::AbstractVector{<:Real}; alpha::Real=0.05, eps::Real=0.1, start::Int=1, kwargs...
)

Compute the Heidelberger and Welch diagnostic [Heidelberger1983]. This diagnostic tests for non-convergence (non-stationarity) and whether ratios of estimation interval halfwidths to means are within a target ratio. Stationarity is rejected (0) for significant test p-values. Halfwidth tests are rejected (0) if observed ratios are greater than the target, as is the case for s2 and beta[1].

kwargs are forwarded to mcse.

source
MCMCDiagnosticTools.rafterydiagFunction
rafterydiag(
    x::AbstractVector{<:Real}; q=0.025, r=0.005, s=0.95, eps=0.001, range=1:length(x)
)

Compute the Raftery and Lewis diagnostic [Raftery1992]. This diagnostic is used to determine the number of iterations required to estimate a specified quantile q within a desired degree of accuracy. The diagnostic is designed to determine the number of autocorrelated samples required to estimate a specified quantile $\theta_q$, such that $\Pr(\theta \le \theta_q) = q$, within a desired degree of accuracy. In particular, if $\hat{\theta}_q$ is the estimand and $\Pr(\theta \le \hat{\theta}_q) = \hat{P}_q$ the estimated cumulative probability, then accuracy is specified in terms of r and s, where $\Pr(q - r < \hat{P}_q < q + r) = s$. Thinning may be employed in the calculation of the diagnostic to satisfy its underlying assumptions. However, users may not want to apply the same (or any) thinning when estimating posterior summary statistics because doing so results in a loss of information. Accordingly, sample sizes estimated by the diagnostic tend to be conservative (too large).

Furthermore, the argument r specifies the margin of error for estimated cumulative probabilities and s the probability for the margin of error. eps specifies the tolerance within which the probabilities of transitioning from initial to retained iterations are within the equilibrium probabilities for the chain. This argument determines the number of samples to discard as a burn-in sequence and is typically left at its default value.

source
  • VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
  • VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
  • VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
  • Geyer1992Geyer, C. J. (1992). Practical Markov Chain Monte Carlo. Statistical Science, 473-483.
  • VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
  • BDA3Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC press.
  • FlegalJones2011Flegal JM, Jones GL. (2011) Implementing MCMC: estimating with confidence. Handbook of Markov Chain Monte Carlo. pp. 175-97. pdf
  • Flegal2012Flegal JM. (2012) Applicability of subsampling bootstrap methods in Markov chain Monte Carlo. Monte Carlo and Quasi-Monte Carlo Methods 2010. pp. 363-72. doi: 10.1007/978-3-642-27440-4_18
  • Betancourt2018Betancourt M. (2018). A Conceptual Introduction to Hamiltonian Monte Carlo. arXiv:1701.02434v2 [stat.ME]
  • Betancourt2016Betancourt M. (2016). Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo. arXiv:1604.00695v1 [stat.ME]
  • Gelman1992Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457-472.
  • Brooks1998Brooks, S. P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of computational and graphical statistics, 7(4), 434-455.
  • Geweke1991Geweke, J. F. (1991). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments (No. 148). Federal Reserve Bank of Minneapolis.
  • Heidelberger1983Heidelberger, P., & Welch, P. D. (1983). Simulation run length control in the presence of an initial transient. Operations Research, 31(6), 1109-1144.
  • Raftery1992A L Raftery and S Lewis. Bayesian Statistics, chapter How Many Iterations in the Gibbs Sampler? Volume 4. Oxford University Press, New York, 1992.