```
using Turing
using FillArrays
using Lux
using Plots
using Tracker
using Functors
using LinearAlgebra
using Random
```

# Bayesian Neural Networks

In this tutorial, we demonstrate how one can implement a Bayesian Neural Network using a combination of Turing and Flux, a suite of machine learning tools. We will use Flux to specify the neural network’s layers and Turing to implement the probabilistic inference, with the goal of implementing a classification algorithm.

We will begin with importing the relevant libraries.

Our goal here is to use a Bayesian neural network to classify points in an artificial dataset. The code below generates data points arranged in a box-like pattern and displays a graph of the dataset we will be working with.

```
# Number of points to generate
= 80
N = round(Int, N / 4)
M = Random.default_rng()
rng Random.seed!(rng, 1234)
# Generate artificial data
= rand(rng, Float32, M) * 4.5f0;
x1s = rand(rng, Float32, M) * 4.5f0;
x2s = Array([[x1s[i] + 0.5f0; x2s[i] + 0.5f0] for i in 1:M])
xt1s = rand(rng, Float32, M) * 4.5f0;
x1s = rand(rng, Float32, M) * 4.5f0;
x2s append!(xt1s, Array([[x1s[i] - 5.0f0; x2s[i] - 5.0f0] for i in 1:M]))
= rand(rng, Float32, M) * 4.5f0;
x1s = rand(rng, Float32, M) * 4.5f0;
x2s = Array([[x1s[i] + 0.5f0; x2s[i] - 5.0f0] for i in 1:M])
xt0s = rand(rng, Float32, M) * 4.5f0;
x1s = rand(rng, Float32, M) * 4.5f0;
x2s append!(xt0s, Array([[x1s[i] - 5.0f0; x2s[i] + 0.5f0] for i in 1:M]))
# Store all the data for later
= [xt1s; xt0s]
xs = [ones(2 * M); zeros(2 * M)]
ts
# Plot data points.
function plot_data()
= map(e -> e[1], xt1s)
x1 = map(e -> e[2], xt1s)
y1 = map(e -> e[1], xt0s)
x2 = map(e -> e[2], xt0s)
y2
scatter(x1, y1; color="red", clim=(0, 1))
Plots.return Plots.scatter!(x2, y2; color="blue", clim=(0, 1))
end
plot_data()
```

## Building a Neural Network

The next step is to define a feedforward neural network where we express our parameters as distributions, and not single points as with traditional neural networks. For this we will use `Dense`

to define liner layers and compose them via `Chain`

, both are neural network primitives from Lux. The network `nn_initial`

we created has two hidden layers with `tanh`

activations and one output layer with sigmoid (`σ`

) activation, as shown below.

The `nn_initial`

is an instance that acts as a function and can take data as inputs and output predictions. We will define distributions on the neural network parameters.

```
# Construct a neural network using Lux
= Chain(Dense(2 => 3, tanh), Dense(3 => 2, tanh), Dense(2 => 1, σ))
nn_initial
# Initialize the model weights and state
= Lux.setup(rng, nn_initial)
ps, st
parameterlength(nn_initial) # number of paraemters in NN Lux.
```

`20`

The probabilistic model specification below creates a `parameters`

variable, which has IID normal variables. The `parameters`

vector represents all parameters of our neural net (weights and biases).

```
# Create a regularization term and a Gaussian prior variance term.
= 0.09
alpha = sqrt(1.0 / alpha) sigma
```

`3.3333333333333335`

Construct named tuple from a sampled parameter vector. We could also use ComponentArrays here and simply broadcast to avoid doing this. But let’s do it this way to avoid dependencies.

```
function vector_to_parameters(ps_new::AbstractVector, ps::NamedTuple)
@assert length(ps_new) == Lux.parameterlength(ps)
= 1
i function get_ps(x)
= reshape(view(ps_new, i:(i + length(x) - 1)), size(x))
z += length(x)
i return z
end
return fmap(get_ps, ps)
end
```

`vector_to_parameters (generic function with 1 method)`

To interface with external libraries it is often desirable to use the `StatefulLuxLayer`

to automatically handle the neural network states.

```
const nn = StatefulLuxLayer(nn_initial, st)
# Specify the probabilistic model.
@model function bayes_nn(xs, ts; sigma = sigma, ps = ps, nn = nn)
# Sample the parameters
= Lux.parameterlength(nn_initial)
nparameters ~ MvNormal(zeros(nparameters), Diagonal(abs2.(sigma .* ones(nparameters))))
parameters
# Forward NN to make predictions
= Lux.apply(nn, xs, vector_to_parameters(parameters, ps))
preds
# Observe each prediction.
for i in eachindex(ts)
~ Bernoulli(preds[i])
ts[i] end
end
```

`bayes_nn (generic function with 2 methods)`

Inference can now be performed by calling `sample`

. We use the `NUTS`

Hamiltonian Monte Carlo sampler here.

`setprogress!(false)`

```
# Perform inference.
= 2_000
N = sample(bayes_nn(reduce(hcat, xs), ts), NUTS(; adtype=AutoTracker()), N); ch
```

```
┌ Info: Found initial step size
└ ϵ = 0.4
```

Now we extract the parameter samples from the sampled chain as `θ`

(this is of size `5000 x 20`

where `5000`

is the number of iterations and `20`

is the number of parameters). We’ll use these primarily to determine how good our model’s classifier is.

```
# Extract all weight and bias parameters.
= MCMCChains.group(ch, :parameters).value; θ
```

## Prediction Visualization

We can use MAP estimation to classify our population by using the set of weights that provided the highest log posterior.

```
# A helper to run the nn through data `x` using parameters `θ`
nn_forward(x, θ) = nn(x, vector_to_parameters(θ, ps))
# Plot the data we have.
= plot_data()
fig
# Find the index that provided the highest log posterior in the chain.
= findmax(ch[:lp])
_, i
# Extract the max row value from i.
= i.I[1]
i
# Plot the posterior distribution with a contour plot
= collect(range(-6; stop=6, length=25))
x1_range = collect(range(-6; stop=6, length=25))
x2_range = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]
Z contour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)
fig
```

The contour plot above shows that the MAP method is not too bad at classifying our data.

Now we can visualize our predictions.

\[ p(\tilde{x} | X, \alpha) = \int_{\theta} p(\tilde{x} | \theta) p(\theta | X, \alpha) \approx \sum_{\theta \sim p(\theta | X, \alpha)}f_{\theta}(\tilde{x}) \]

The `nn_predict`

function takes the average predicted value from a network parameterized by weights drawn from the MCMC chain.

```
# Return the average predicted value across
# multiple weights.
function nn_predict(x, θ, num)
= min(num, size(θ, 1)) # make sure num does not exceed the number of samples
num return mean([first(nn_forward(x, view(θ, i, :))) for i in 1:10:num])
end
```

`nn_predict (generic function with 1 method)`

Next, we use the `nn_predict`

function to predict the value at a sample of points where the `x1`

and `x2`

coordinates range between -6 and 6. As we can see below, we still have a satisfactory fit to our data, and more importantly, we can also see where the neural network is uncertain about its predictions much easier—those regions between cluster boundaries.

```
# Plot the average prediction.
= plot_data()
fig
= 1500
n_end = collect(range(-6; stop=6, length=25))
x1_range = collect(range(-6; stop=6, length=25))
x2_range = [nn_predict([x1, x2], θ, n_end)[1] for x1 in x1_range, x2 in x2_range]
Z contour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)
fig
```

Suppose we are interested in how the predictive power of our Bayesian neural network evolved between samples. In that case, the following graph displays an animation of the contour plot generated from the network weights in samples 1 to 1,000.

```
# Number of iterations to plot.
= 500
n_end
= @gif for i in 1:n_end
anim plot_data()
= [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]
Z contour!(x1_range, x2_range, Z; title="Iteration $i", clim=(0, 1))
end every 5
```

`[ Info: Saved animation to /tmp/jl_NAT2FUXkVe.gif`