For our example on Negative Binomial Regression, let's use a famous dataset called roaches
(Gelman & Hill, 2007), which is data on the efficacy of a pest management system at reducing the number of roaches in urban apartments. It has 262 observations and the following variables:
y
– number of roaches caught.roach1
– pretreatment number of roaches.treatment
– binary/dummy (0 or 1) for treatment indicator.senior
– binary/dummy (0 or 1) for only elderly residents in building.exposure2
– number of days for which the roach traps were used
using CSV
using DataFrames
url = "https://github.com/TuringLang/TuringGLM.jl/raw/main/data/roaches.csv";
roaches = CSV.read(download(url), DataFrame)
y | roach1 | treatment | senior | exposure2 | |
---|---|---|---|---|---|
1 | 153 | 308.0 | 1 | 0 | 0.8 |
2 | 127 | 331.25 | 1 | 0 | 0.6 |
3 | 7 | 1.67 | 1 | 0 | 1.0 |
4 | 7 | 3.0 | 1 | 0 | 1.0 |
5 | 0 | 2.0 | 1 | 0 | 1.14286 |
6 | 0 | 0.0 | 1 | 0 | 1.0 |
7 | 73 | 70.0 | 1 | 0 | 0.8 |
8 | 24 | 64.56 | 1 | 0 | 1.14286 |
9 | 2 | 1.0 | 0 | 0 | 1.0 |
10 | 2 | 14.0 | 0 | 0 | 1.14286 |
... | |||||
262 | 8 | 0.0 | 0 | 1 | 1.0 |
using TuringGLM
Using y
as dependent variable and roach1
, treatment
, and senior
as independent variables:
fm = @formula(y ~ roach1 + treatment + senior)
FormulaTerm Response: y(unknown) Predictors: roach1(unknown) treatment(unknown) senior(unknown)
We instantiate our model with turing_model
passing a keyword argument model=NegativeBinomial
to indicate that the model is a negative binomial regression:
model = turing_model(fm, roaches; model=NegativeBinomial);
chn = sample(model, NUTS(), 2_000);
plot_chains(chn)
References
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.