using CSV, DataFrames
= CSV.read("golf.dat", DataFrame; delim=' ', ignorerepeated=true)
df 1:5, :] df[
Row | distance | n | y |
---|---|---|---|
Int64 | Int64 | Int64 | |
1 | 2 | 1443 | 1346 |
2 | 3 | 694 | 577 |
3 | 4 | 455 | 337 |
4 | 5 | 353 | 208 |
5 | 6 | 272 | 149 |
JuliaGPs packages integrate well with Turing.jl because they implement the Distributions.jl interface. You should be able to understand what is going on in this tutorial if you know what a GP is. For a more in-depth understanding of the JuliaGPs functionality used here, please consult the JuliaGPs docs.
In this tutorial, we will model the putting dataset discussed in Chapter 21 of Bayesian Data Analysis. The dataset comprises the result of measuring how often a golfer successfully gets the ball in the hole, depending on how far away from it they are. The goal of inference is to estimate the probability of any given shot being successful at a given distance.
using CSV, DataFrames
df = CSV.read("golf.dat", DataFrame; delim=' ', ignorerepeated=true)
df[1:5, :]
Row | distance | n | y |
---|---|---|---|
Int64 | Int64 | Int64 | |
1 | 2 | 1443 | 1346 |
2 | 3 | 694 | 577 |
3 | 4 | 455 | 337 |
4 | 5 | 353 | 208 |
5 | 6 | 272 | 149 |
We’ve printed the first 5 rows of the dataset (which comprises only 19 rows in total). Observe it has three columns:
distance
– how far away from the hole. I’ll refer to distance
as d
throughout the rest of this tutorialn
– how many shots were taken from a given distancey
– how many shots were successful from a given distanceWe will use a Binomial model for the data, whose success probability is parametrised by a transformation of a GP. Something along the lines of: \[ \begin{aligned} f & \sim \operatorname{GP}(0, k) \\ y_j \mid f(d_j) & \sim \operatorname{Binomial}(n_j, g(f(d_j))) \\ g(x) & := \frac{1}{1 + e^{-x}} \end{aligned} \]
To do this, let’s define our Turing.jl model:
using AbstractGPs, LogExpFunctions, Turing
@model function putting_model(d, n; jitter=1e-4)
v ~ Gamma(2, 1)
l ~ Gamma(4, 1)
f = GP(v * with_lengthscale(SEKernel(), l))
f_latent ~ f(d, jitter)
y ~ product_distribution(Binomial.(n, logistic.(f_latent)))
return (fx=f(d, jitter), f_latent=f_latent, y=y)
end
putting_model (generic function with 2 methods)
We first define an AbstractGPs.GP
, which represents a distribution over functions, and is entirely separate from Turing.jl. We place a prior over its variance v
and length-scale l
. f(d, jitter)
constructs the multivariate Gaussian comprising the random variables in f
whose indices are in d
(plus a bit of independent Gaussian noise with variance jitter
– see the docs for more details). f(d, jitter)
has the type AbstractMvNormal
, and is the bit of AbstractGPs.jl that implements the Distributions.jl interface, so it’s legal to put it on the right-hand side of a ~
. From this you should deduce that f_latent
is distributed according to a multivariate Gaussian. The remaining lines comprise standard Turing.jl code that is encountered in other tutorials and Turing documentation.
Before performing inference, we might want to inspect the prior that our model places over the data, to see whether there is anything obviously wrong. These kinds of prior predictive checks are straightforward to perform using Turing.jl, since it is possible to sample from the prior easily by just calling the model:
19-element Vector{Int64}:
343
214
166
133
99
84
62
61
70
106
110
94
84
69
80
79
73
69
65
We make use of this to see what kinds of datasets we simulate from the prior:
using Plots
function plot_data(d, n, y, xticks, yticks)
ylims = (0, round(maximum(n), RoundUp; sigdigits=2))
margin = -0.5 * Plots.mm
plt = plot(; xticks=xticks, yticks=yticks, ylims=ylims, margin=margin, grid=false)
bar!(plt, d, n; color=:red, label="", alpha=0.5)
bar!(plt, d, y; label="", color=:blue, alpha=0.7)
return plt
end
# Construct model and run some prior predictive checks.
m = putting_model(Float64.(df.distance), df.n)
hists = map(1:20) do j
xticks = j > 15 ? :auto : nothing
yticks = rem(j, 5) == 1 ? :auto : nothing
return plot_data(df.distance, df.n, m().y, xticks, yticks)
end
plot(hists...; layout=(4, 5))
In this case, the only prior knowledge I have is that the proportion of successful shots ought to decrease monotonically as the distance from the hole increases, which should show up in the data as the blue lines generally go down as we move from left to right on each graph. Unfortunately, there is not a simple way to enforce monotonicity in the samples from a GP, and we can see this in some of the plots above, so we must hope that we have enough data to ensure that this relationship holds approximately under the posterior. In any case, you can judge for yourself whether you think this is the most useful visualisation that we can perform – if you think there is something better to look at, please let us know!
Moving on, we generate samples from the posterior using the default NUTS
sampler. We’ll make use of ReverseDiff.jl, as it has better performance than ForwardDiff.jl on this example. See Turing.jl’s docs on Automatic Differentiation for more info.
using Random, ReverseDiff
m_post = m | (y=df.y,)
chn = sample(Xoshiro(123456), m_post, NUTS(; adtype=AutoReverseDiff()), 1_000, progress=false)
┌ Info: Found initial step size └ ϵ = 0.2
Chains MCMC chain (1000×33×1 Array{Float64, 3}): Iterations = 501:1:1500 Number of chains = 1 Samples per chain = 1000 Wall duration = 148.17 seconds Compute duration = 148.17 seconds parameters = v, l, f_latent[1], f_latent[2], f_latent[3], f_latent[4], f_latent[5], f_latent[6], f_latent[7], f_latent[8], f_latent[9], f_latent[10], f_latent[11], f_latent[12], f_latent[13], f_latent[14], f_latent[15], f_latent[16], f_latent[17], f_latent[18], f_latent[19] internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std mcse ess_bulk ess_tail rhat ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯ v 2.8740 1.2900 0.0418 1067.5214 787.6270 1.0003 ⋯ l 3.6385 0.8280 0.0519 225.8803 286.8892 1.0033 ⋯ f_latent[1] 2.5453 0.0979 0.0030 1077.4782 490.0008 1.0005 ⋯ f_latent[2] 1.7002 0.0683 0.0019 1257.5024 704.7677 1.0013 ⋯ f_latent[3] 0.9733 0.0794 0.0033 582.2903 566.8537 1.0026 ⋯ f_latent[4] 0.4780 0.0762 0.0026 876.7129 525.4170 1.0013 ⋯ f_latent[5] 0.1911 0.0734 0.0024 933.8486 716.6565 0.9998 ⋯ f_latent[6] -0.0141 0.0875 0.0039 521.0220 721.1299 0.9992 ⋯ f_latent[7] -0.2437 0.0828 0.0029 835.3462 702.7869 1.0007 ⋯ f_latent[8] -0.5026 0.0878 0.0028 1025.6213 535.9936 1.0047 ⋯ f_latent[9] -0.7226 0.0997 0.0037 732.5870 627.0753 1.0012 ⋯ f_latent[10] -0.8661 0.0966 0.0033 865.2909 675.3767 1.0007 ⋯ f_latent[11] -0.9529 0.0980 0.0031 1003.0110 611.8971 1.0004 ⋯ f_latent[12] -1.0450 0.1074 0.0043 639.0534 682.9616 0.9999 ⋯ f_latent[13] -1.1966 0.1136 0.0045 629.2198 743.7114 1.0000 ⋯ f_latent[14] -1.4113 0.1118 0.0037 928.4357 656.2373 1.0011 ⋯ f_latent[15] -1.6110 0.1204 0.0047 700.0815 491.5819 1.0041 ⋯ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 1 column and 4 rows omitted Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 v 1.0905 1.9373 2.6189 3.5243 5.9021 l 2.4117 3.1094 3.4718 3.9832 5.8400 f_latent[1] 2.3648 2.4759 2.5432 2.6132 2.7529 f_latent[2] 1.5706 1.6537 1.6976 1.7457 1.8365 f_latent[3] 0.8334 0.9190 0.9704 1.0227 1.1449 f_latent[4] 0.3325 0.4269 0.4772 0.5279 0.6299 f_latent[5] 0.0512 0.1416 0.1881 0.2420 0.3425 f_latent[6] -0.1823 -0.0692 -0.0160 0.0442 0.1433 f_latent[7] -0.4100 -0.2984 -0.2433 -0.1898 -0.0828 f_latent[8] -0.6755 -0.5618 -0.4982 -0.4444 -0.3421 f_latent[9] -0.9189 -0.7881 -0.7216 -0.6587 -0.5317 f_latent[10] -1.0635 -0.9291 -0.8627 -0.8002 -0.6758 f_latent[11] -1.1413 -1.0151 -0.9532 -0.8866 -0.7573 f_latent[12] -1.2495 -1.1169 -1.0456 -0.9743 -0.8328 f_latent[13] -1.4153 -1.2762 -1.1978 -1.1191 -0.9755 f_latent[14] -1.6249 -1.4912 -1.4057 -1.3325 -1.1948 f_latent[15] -1.8592 -1.6882 -1.6062 -1.5254 -1.3887 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 4 rows omitted
We can use these samples and the posterior
function from AbstractGPs
to sample from the posterior probability of success at any distance we choose:
d_pred = 1:0.2:21
samples = map(returned(m_post, chn)[1:10:end]) do x
return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4)))
end
p = plot()
plot!(d_pred, reduce(hcat, samples); label="", color=:blue, alpha=0.2)
scatter!(df.distance, df.y ./ df.n; label="", color=:red)
We can see that the general trend is indeed down as the distance from the hole increases, and that if we move away from the data, the posterior uncertainty quickly inflates. This suggests that the model is probably going to do a reasonable job of interpolating between observed data, but less good a job at extrapolating to larger distances.