Currently, TuringGLM only supports hierarchical models with a single random-intercept. This is done by using the (1 | group)
inside the @formula
macro.
For our Hierarchical Model example, let's use a famous dataset called cheese
(Boatwright, McCulloch & Rossi, 1999), which is data from cheese ratings. A group of 10 rural and 10 urban raters rated 4 types of different cheeses (A, B, C and D) in two samples. So we have \(4 \cdot 20 \cdot 2 = 160\) observations and 4 variables:
cheese
: type of cheese fromA
toD
rater
: id of the rater from1
to10
background
: type of rater, eitherrural
orurban
y
: rating of the cheese
using CSV
using DataFrames
url = "https://github.com/TuringLang/TuringGLM.jl/raw/main/data/cheese.csv";
cheese = CSV.read(download(url), DataFrame)
cheese | rater | background | y | |
---|---|---|---|---|
1 | "A" | 1 | "rural" | 67 |
2 | "A" | 1 | "rural" | 66 |
3 | "B" | 1 | "rural" | 51 |
4 | "B" | 1 | "rural" | 53 |
5 | "C" | 1 | "rural" | 75 |
6 | "C" | 1 | "rural" | 70 |
7 | "D" | 1 | "rural" | 68 |
8 | "D" | 1 | "rural" | 66 |
9 | "A" | 2 | "rural" | 76 |
10 | "A" | 2 | "rural" | 76 |
... | ||||
160 | "D" | 10 | "urban" | 83 |
using TuringGLM
Using y
as dependent variable and background
is independent variable with a varying-intercept per cheese
type:
fm = @formula(y ~ (1 | cheese) + background)
FormulaTerm Response: y(unknown) Predictors: background(unknown) (cheese)->1 | cheese
We instantiate our model with turing_model
without specifying any model, thus the default model will be used (model=Normal
):
model = turing_model(fm, cheese);
chn = sample(model, NUTS(), 2_000);
plot_chains(chn)
References
Boatwright, P., McCulloch, R., & Rossi, P. (1999). Account-level modeling for trade promotion: An application of a constrained parameter hierarchical model. Journal of the American Statistical Association, 94(448), 1063–1073.