TuringGLM
Documentation for TuringGLM.
TuringGLM.CustomPrior
TuringGLM.NegativeBinomial2
TuringGLM.center_predictors
TuringGLM.convert_str_to_indices
TuringGLM.data_fixed_effects
TuringGLM.data_random_effects
TuringGLM.data_response
TuringGLM.get_idx
TuringGLM.get_var
TuringGLM.has_ranef
TuringGLM.intercept_per_ranef
TuringGLM.n_ranef
TuringGLM.ranef
TuringGLM.slope_per_ranef
TuringGLM.standardize_predictors
TuringGLM.standardize_predictors
TuringGLM.tuple_length
TuringGLM.turing_model
TuringGLM.CustomPrior
— TypeCustomPrior(predictors, intercept, auxiliary)
struct to hold information regarding user-specified custom priors.
Usage
The CustomPrior
struct has 3 fields:
predictors
: the β coefficients.intercept
: the α intercept.auxiliary
: an auxiliary parameter.
In robust models, e.g. Linear Regression with Student-t likelihood or Count Regression with Negative Binomial likelihood, often there is an extra auxiliary parameter that is needed to parametrize to model to overcome under- or over-dispersion. If you are specifying a custom prior for one of these type of models, then you should also specify a prior for the auxiliary parameter.
Non-robust models do not need an auxiliary parameter and you can pass nothing
as the auxiliary argument.
TuringGLM.NegativeBinomial2
— MethodNegativeBinomial2(μ, ϕ)
An alternative parameterization of the Negative Binomial distribution:
\[\text{Negative-Binomial}(n \mid \mu, \phi) \sim \binom{n + \phi - 1}{n} \left( \frac{\mu}{\mu + \phi} \right)^{n!} \left( \frac{\phi}{\mu + \phi} \right)^{\phi!}\]
where the expectation is μ and variance is (μ + μ²/ϕ).
The alternative parameterization is inspired by the Stan's neg_binomial_2
function.
TuringGLM.center_predictors
— Methodcenter_predictors(X::AbstractMatrix)
Centers the columns of a matrix X
of predictors to mean 0.
Returns a tuple with:
μ_X
: 1xKMatrix
ofFloat64
s of the means of the K columns in the originalX
matrix.
X_centered
: AMatrix
ofFloat64
s with the same dimensions as the original matrix
X
with the columns centered on mean μ=0.
Arguments
X::AbstractMatrix
: a matrix of predictors where rows are observations and columns are
variables.
TuringGLM.convert_str_to_indices
— Methodconvert_str_to_indices(v::AbstractVector)
Converts a vector v
to a vector of indices, i.e. a vector where all the entries are integers. Returns a tuple with the first element as the converted vector and the second element a Dict
specifying which string is which integer.
This function is especially useful for random-effects varying-intercept hierarchical models. Normally v
would be a vector of group membership with values such as "group_1"
, "group_2"
etc. For random-effect models with varying-intercepts, Turing needs the group membership values to be passed as Int
s.
TuringGLM.data_fixed_effects
— Methoddata_fixed_effects(formula::FormulaTerm, data)
Constructs the matrix X of fixed-effects (a.k.a. population-level) predictors.
Returns a Matrix
of the fixed-effects predictors variables in the formula
and present inside data
.
Arguments
formula
: aFormulaTerm
created by@formula
macro.data
: adata
object that satisfies the
Tables.jl interface such as a DataFrame.
TuringGLM.data_random_effects
— Methoddata_random_effects(formula::FormulaTerm, data)
Constructs the vector(s)/matrix(ces) Z(s) of random-effects (a.k.a. group-level) slope predictors.
Returns a Dict{String, AbstractArray}
of Vector
/Matrix
as values of the random-effects predictors slope variables (keys) in the formula
and present inside data
.
Arguments
formula
: aFormulaTerm
created by@formula
macro.data
: adata
object that satisfies the
Tables.jl interface such as a DataFrame.
TuringGLM.data_response
— Methoddata_response(formula::FormulaTerm, data)
Constructs the response y vector.
Returns a Vector
of the response variable in the formula
and present inside data
.
Arguments
formula
: aFormulaTerm
created by@formula
macro.data
: adata
object that satisfies the
Tables.jl interface such as a DataFrame.
TuringGLM.get_idx
— Methodget_idx(term::Term, data)
Returns a tuple with the first element as the ID vector of Int
s that represent group membership for a specific random-effect intercept group t
of observations present in data
. The second element of the tuple is a Dict
specifying which string is which integer in the ID vector.
TuringGLM.get_var
— Methodget_var(term::Term, data)
Returns the corresponding vector of column in data
for the a specific random-effect slope term
of observations present in data
.
TuringGLM.has_ranef
— Methodhas_ranef(formula::FormulaTerm)
Returns true
if any of the terms in formula
is a FunctionTerm
or false otherwise.
TuringGLM.intercept_per_ranef
— Methodintercept_per_ranef(terms::Tuple{RandomEffectsTerm})
Returns a vector of String
s where the entries are the grouping variables that have a group-level intercept.
TuringGLM.n_ranef
— Methodn_ranef(formula::FormulaTerm)
Returns the number of RandomEffectsTerm
s in formula
.
TuringGLM.ranef
— Methodranef(formula::FormulaTerm)
Returns a tuple of the FunctionTerm
s parsed as RandomEffectsTerm
s in formula
. If there are no FunctionTerm
s in formula
returns nothing
.
TuringGLM.slope_per_ranef
— Methodslope_per_ranef(terms::Tuple{RandomEffectsTerm})
Returns a SlopePerRanEf
object where the entries are the grouping variables that have a group-level slope.
TuringGLM.standardize_predictors
— Methodstandardize_predictors(X::AbstractMatrix)
Standardizes the columns of a matrix X
of predictors to mean 0 and standard deviation 1.
Returns a tuple with:
μ_X
: 1xKMatrix
ofFloat64
s of the means of the K columns in the originalX
matrix.
σ_X
: 1xKMatrix
ofFloat64
s of the standard deviations of the K columns in the
original X
matrix.
X_std
: AMatrix
ofFloat64
s with the same dimensions as the original matrix
X
with the columns centered on mean μ=0 and standard deviation σ=1.
Arguments
X::AbstractMatrix
: a matrix of predictors where rows are observations and columns are
variables.
TuringGLM.standardize_predictors
— Methodstandardize_predictors(x::AbstractVector)
Standardizes the vector x
to mean 0 and standard deviation 1.
Returns a tuple with:
μ_X
:Float64
s of the mean of the original vectorx
.σ_X
:Float64
s of the standard deviations of the original vectorx
.x_std
: AVector
ofFloat64
s with the same length as the original vector
x
with the values centered on mean μ=0 and standard deviation σ=1.
Arguments
x::AbstractVector
: a vector.
TuringGLM.tuple_length
— Methodtuple_length(::NTuple{N, Any}) where {N} = Int(N)
This is a hack to get the length of any tuple.
TuringGLM.turing_model
— Methodturing_model(formula, data; model=Normal, priors=DefaultPrior(), standardize=false)
Create a Turing model using formula
syntax and a data
source.
formula
formula
is the the same friendly interface to specify used to specify statistical models by brms
, rstarnarm
, bambi
, StatsModels.jl
and MixedModels.jl
. The syntax is done by using the @formula
macro and then specifying the dependent variable followed by a tilde ~
then the independent variables separated by a plus sign +
.
Example: @formula(y ~ x1 + x2 + x3)
.
Moderations/interactions can be specified with the asterisk sign *
, e.g. x1 * x2
. This will be expanded to x1 + x2 + x1:x2
, which, following the principle of hierarchy, the main effects must also be added along with the interaction effects. Here x1:x2
means that the values of x1
will be multiplied (interacted) with the values of x2
.
Random-effects (a.k.a. group-level effects) can be specified with the (term | group)
inside the @formula
, where term
is the independent variable and group
is the categorical representation (i.e., either a column of String
s or a CategoricalArray
in data
). You can specify a random-intercept with (1 | group)
.
Example: @formula(y ~ (1 | group) + x1)
.
Notice: random-effects are currently only implemented for a single group-level intercept. Future versions of TuringGLM.jl
will support slope random-effects and multiple group-level effets.
data
data
can be any Tables.jl
-compatible data interface. The most popular ones are DataFrame
s and NamedTuple
s.
model
model
represents the likelihood function which you want to condition your data on. It has to be a subtype of Distributions.UnivariateDistribution
. Currently, TuringGLM.jl
supports:
Normal
(the default if not specified): linear regressionTDist
: robust linear regressionBernoulli
: logistic regressionPoisson
: Poisson count data regressionNegativeBinomial
: negative binomial robust count data regression
priors
TuringGLM.jl
comes with state-of-the-art default priors, based on the literature and the Stan community. By default, turing_model
will use DefaultPrior
. But you can specify your own with priors=CustomPrior(predictors, intercept, auxiliary)
. All models take a predictors
and intercept
priors.
In robust models, e.g. Linear Regression with Student-t likelihood or Count Regression with Negative Binomial likelihood, often there is an extra auxiliary parameter that is needed to parametrize to model to overcome under- or over-dispersion. If you are specifying a custom prior for one of these type of models, then you should also specify a prior for the auxiliary parameter.
Non-robust models do not need an auxiliary parameter and you can pass nothing
as the auxiliary argument.
Example for a non-robust model: @formula(y, ...), data; priors=CustomPrior(Normal(0, 2.5), Normal(10, 5), nothing)
Example for a robust model: @formula(y, ...), data; priors=CustomPrior(Normal(0, 2.5), Normal(10, 5), Exponential(1))
standardize
Whether true
or false
to standardize your data to mean 0 and standard deviation 1 before inference. Some science fields prefer to analyze and report effects in terms of standard devations. Also, whenever measurement scales differs, it is often suggested to standardize the effects for better comparison. By default, turing_model
sets standardize=false
.