```
# Import Turing.
using Turing
# Package for loading the data set.
using RDatasets
# Package for visualization.
using StatsPlots
# Functionality for splitting the data.
using MLUtils: splitobs
# Functionality for constructing arrays with identical elements efficiently.
using FillArrays
# Functionality for normalizing the data and evaluating the model predictions.
using StatsBase
# Functionality for working with scaled identity matrices.
using LinearAlgebra
# Set a seed for reproducibility.
using Random
Random.seed!(0);
```

# Linear Regression

Turing is powerful when applied to complex hierarchical models, but it can also be put to task at common statistical procedures, like linear regression. This tutorial covers how to implement a linear regression model in Turing.

## Set Up

We begin by importing all the necessary libraries.

`setprogress!(false)`

We will use the `mtcars`

dataset from the RDatasets package. `mtcars`

contains a variety of statistics on different car models, including their miles per gallon, number of cylinders, and horsepower, among others.

We want to know if we can construct a Bayesian linear regression model to predict the miles per gallon of a car, given the other statistics it has. Let us take a look at the data we have.

```
# Load the dataset.
= RDatasets.dataset("datasets", "mtcars")
data
# Show the first six rows of the dataset.
first(data, 6)
```

Row | Model | MPG | Cyl | Disp | HP | DRat | WT | QSec | VS | AM | Gear | Carb |
---|---|---|---|---|---|---|---|---|---|---|---|---|

String31 | Float64 | Int64 | Float64 | Int64 | Float64 | Float64 | Float64 | Int64 | Int64 | Int64 | Int64 | |

1 | Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.9 | 2.62 | 16.46 | 0 | 1 | 4 | 4 |

2 | Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.9 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |

3 | Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.32 | 18.61 | 1 | 1 | 4 | 1 |

4 | Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |

5 | Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.44 | 17.02 | 0 | 0 | 3 | 2 |

6 | Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.46 | 20.22 | 1 | 0 | 3 | 1 |

`size(data)`

`(32, 12)`

The next step is to get our data ready for testing. We’ll split the `mtcars`

dataset into two subsets, one for training our model and one for evaluating our model. Then, we separate the targets we want to learn (`MPG`

, in this case) and standardize the datasets by subtracting each column’s means and dividing by the standard deviation of that column. The resulting data is not very familiar looking, but this standardization process helps the sampler converge far easier.

```
# Remove the model column.
select!(data, Not(:Model))
# Split our dataset 70%/30% into training/test sets.
= map(DataFrame, splitobs(data; at=0.7, shuffle=true))
trainset, testset
# Turing requires data in matrix form.
= :MPG
target = Matrix(select(trainset, Not(target)))
train = Matrix(select(testset, Not(target)))
test = trainset[:, target]
train_target = testset[:, target]
test_target
# Standardize the features.
= fit(ZScoreTransform, train; dims=1)
dt_features transform!(dt_features, train)
StatsBase.transform!(dt_features, test)
StatsBase.
# Standardize the targets.
= fit(ZScoreTransform, train_target)
dt_targets transform!(dt_targets, train_target)
StatsBase.transform!(dt_targets, test_target); StatsBase.
```

## Model Specification

In a traditional frequentist model using OLS, our model might look like:

\[ \mathrm{MPG}_i = \alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i} \]

where \(\boldsymbol{\beta}\) is a vector of coefficients and \(\boldsymbol{X}\) is a vector of inputs for observation \(i\). The Bayesian model we are more concerned with is the following:

\[ \mathrm{MPG}_i \sim \mathcal{N}(\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}, \sigma^2) \]

where \(\alpha\) is an intercept term common to all observations, \(\boldsymbol{\beta}\) is a coefficient vector, \(\boldsymbol{X_i}\) is the observed data for car \(i\), and \(\sigma^2\) is a common variance term.

For \(\sigma^2\), we assign a prior of `truncated(Normal(0, 100); lower=0)`

. This is consistent with Andrew Gelman’s recommendations on noninformative priors for variance. The intercept term (\(\alpha\)) is assumed to be normally distributed with a mean of zero and a variance of three. This represents our assumptions that miles per gallon can be explained mostly by our assorted variables, but a high variance term indicates our uncertainty about that. Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10. We do not know that our coefficients are different from zero, and we don’t know which ones are likely to be the most important, so the variance term is quite high. Lastly, each observation \(y_i\) is distributed according to the calculated `mu`

term given by \(\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}\).

```
# Bayesian linear regression.
@model function linear_regression(x, y)
# Set variance prior.
~ truncated(Normal(0, 100); lower=0)
σ²
# Set intercept prior.
~ Normal(0, sqrt(3))
intercept
# Set the priors on our coefficients.
= size(x, 2)
nfeatures ~ MvNormal(Zeros(nfeatures), 10.0 * I)
coefficients
# Calculate all the mu terms.
= intercept .+ x * coefficients
mu return y ~ MvNormal(mu, σ² * I)
end
```

`linear_regression (generic function with 2 methods)`

With our model specified, we can call the sampler. We will use the No U-Turn Sampler (NUTS) here.

```
= linear_regression(train, train_target)
model = sample(model, NUTS(), 5_000) chain
```

```
┌ Info: Found initial step size
└ ϵ = 0.4
```

Chains MCMC chain (5000×24×1 Array{Float64, 3}): Iterations = 1001:1:6000 Number of chains = 1 Samples per chain = 5000 Wall duration = 12.12 seconds Compute duration = 12.12 seconds parameters = σ², intercept, coefficients[1], coefficients[2], coefficients[3], coefficients[4], coefficients[5], coefficients[6], coefficients[7], coefficients[8], coefficients[9], coefficients[10] internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std mcse ess_bulk ess_tail ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Flo ⋯ σ² 0.2328 0.1388 0.0034 1199.2044 803.1765 1. ⋯ intercept -0.0026 0.1029 0.0014 5277.0883 3314.7413 1. ⋯ coefficients[1] -0.0215 0.3726 0.0055 4561.5540 2691.9998 1. ⋯ coefficients[2] 0.3241 0.4197 0.0078 2845.4078 2737.5202 1. ⋯ coefficients[3] -0.4668 0.3017 0.0048 3867.8412 3822.9397 1. ⋯ coefficients[4] 0.0785 0.2512 0.0042 3623.4322 2854.4863 0. ⋯ coefficients[5] -0.7693 0.3494 0.0064 2882.6888 3033.4621 1. ⋯ coefficients[6] 0.2469 0.2883 0.0048 3527.4377 2471.7297 1. ⋯ coefficients[7] -0.0762 0.2499 0.0041 3680.3860 3314.5311 1. ⋯ coefficients[8] 0.0048 0.2471 0.0039 4088.0727 3174.2018 1. ⋯ coefficients[9] 0.1280 0.2371 0.0036 4436.2655 3104.5782 1. ⋯ coefficients[10] 0.0695 0.2726 0.0051 2849.7346 3019.2209 0. ⋯ 2 columns omitted Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 σ² 0.0854 0.1452 0.1981 0.2771 0.5991 intercept -0.2140 -0.0658 -0.0027 0.0616 0.2075 coefficients[1] -0.7623 -0.2437 -0.0183 0.2055 0.7153 coefficients[2] -0.5101 0.0646 0.3270 0.5910 1.1587 coefficients[3] -1.0708 -0.6556 -0.4639 -0.2776 0.1352 coefficients[4] -0.4192 -0.0748 0.0738 0.2314 0.6085 coefficients[5] -1.4816 -0.9924 -0.7780 -0.5552 -0.0598 coefficients[6] -0.3346 0.0661 0.2495 0.4283 0.8137 coefficients[7] -0.5953 -0.2290 -0.0718 0.0810 0.4039 coefficients[8] -0.4948 -0.1488 0.0014 0.1611 0.4959 coefficients[9] -0.3514 -0.0166 0.1253 0.2790 0.6132 coefficients[10] -0.4715 -0.1044 0.0683 0.2379 0.6119

We can also check the densities and traces of the parameters visually using the `plot`

functionality.

`plot(chain)`