Let's cover the Linear Regression example with the kidiq dataset (Gelman & Hill, 2007), which is data from a survey of adult American women and their respective children. Dated from 2007, it has 434 observations and 4 variables:
kid_score: child's IQmom_hs: binary/dummy (0 or 1) if the child's mother has a high school diplomamom_iq: mother's IQmom_age: mother's age
using CSV
using DataFrames
url = "https://github.com/TuringLang/TuringGLM.jl/raw/main/data/kidiq.csv"
"https://github.com/TuringLang/TuringGLM.jl/raw/main/data/kidiq.csv"
kidiq = CSV.read(download(url), DataFrame)
| kid_score | mom_hs | mom_iq | mom_age | |
|---|---|---|---|---|
| 1 | 65 | 1 | 121.118 | 27 |
| 2 | 98 | 1 | 89.3619 | 25 |
| 3 | 85 | 1 | 115.443 | 27 |
| 4 | 83 | 1 | 99.4496 | 25 |
| 5 | 115 | 1 | 92.7457 | 27 |
| 6 | 98 | 0 | 107.902 | 18 |
| 7 | 69 | 1 | 138.893 | 20 |
| 8 | 106 | 1 | 125.145 | 23 |
| 9 | 102 | 1 | 81.6195 | 24 |
| 10 | 95 | 1 | 95.0731 | 19 |
| ... | ||||
| 434 | 70 | 1 | 91.2533 | 25 |
using TuringGLM
Using kid_score as dependent variable and mom_hs along with mom_iq as independent variables with a moderation (interaction) effect:
fm = @formula(kid_score ~ mom_hs * mom_iq)
FormulaTerm Response: kid_score(unknown) Predictors: mom_hs(unknown) mom_iq(unknown) mom_hs(unknown) & mom_iq(unknown)
Let's create our CustomPrior object. No need for the third (auxiliary) prior for this model so we leave it as nothing:
priors = CustomPrior(Normal(0, 2.5), Normal(10, 20), nothing);
We instantiate our model with turing_model without specifying any model, thus the default model will be used (model=Normal). Notice that we are specifying the priors keyword argument:
model = turing_model(fm, kidiq; priors);
chn = sample(model, NUTS(), 2_000);
plot_chains(chn)
References
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.