Gaussian Processes

JuliaGPs packages integrate well with Turing.jl because they implement the Distributions.jl interface. You should be able to understand what is going on in this tutorial if you know what a GP is. For a more in-depth understanding of the JuliaGPs functionality used here, please consult the JuliaGPs docs.

In this tutorial, we will model the putting dataset discussed in Chapter 21 of Bayesian Data Analysis. The dataset comprises the result of measuring how often a golfer successfully gets the ball in the hole, depending on how far away from it they are. The goal of inference is to estimate the probability of any given shot being successful at a given distance.

Let’s download the data and take a look at it:

using CSV, DataDeps, DataFrames

ENV["DATADEPS_ALWAYS_ACCEPT"] = true
register(
    DataDep(
        "putting",
        "Putting data from BDA",
        "http://www.stat.columbia.edu/~gelman/book/data/golf.dat",
        "fc28d83896af7094d765789714524d5a389532279b64902866574079c1a977cc",
    ),
)

fname = joinpath(datadep"putting", "golf.dat")
df = CSV.read(fname, DataFrame; delim=' ', ignorerepeated=true)
df[1:5, :]
5×3 DataFrame
Row distance n y
Int64 Int64 Int64
1 2 1443 1346
2 3 694 577
3 4 455 337
4 5 353 208
5 6 272 149

We’ve printed the first 5 rows of the dataset (which comprises only 19 rows in total). Observe it has three columns:

  1. distance – how far away from the hole. I’ll refer to distance as d throughout the rest of this tutorial
  2. n – how many shots were taken from a given distance
  3. y – how many shots were successful from a given distance

We will use a Binomial model for the data, whose success probability is parametrised by a transformation of a GP. Something along the lines of: \[ f \sim \operatorname{GP}(0, k) \\ y_j \mid f(d_j) \sim \operatorname{Binomial}(n_j, g(f(d_j))) \\ g(x) := \frac{1}{1 + e^{-x}} \]

To do this, let’s define our Turing.jl model:

using AbstractGPs, LogExpFunctions, Turing

@model function putting_model(d, n; jitter=1e-4)
    v ~ Gamma(2, 1)
    l ~ Gamma(4, 1)
    f = GP(v * with_lengthscale(SEKernel(), l))
    f_latent ~ f(d, jitter)
    y ~ product_distribution(Binomial.(n, logistic.(f_latent)))
    return (fx=f(d, jitter), f_latent=f_latent, y=y)
end
putting_model (generic function with 2 methods)

We first define an AbstractGPs.GP, which represents a distribution over functions, and is entirely separate from Turing.jl. We place a prior over its variance v and length-scale l. f(d, jitter) constructs the multivariate Gaussian comprising the random variables in f whose indices are in d (+ a bit of independent Gaussian noise with variance jitter – see the docs for more details). f(d, jitter) isa AbstractMvNormal, and is the bit of AbstractGPs.jl that implements the Distributions.jl interface, so it’s legal to put it on the right-hand side of a ~. From this you should deduce that f_latent is distributed according to a multivariate Gaussian. The remaining lines comprise standard Turing.jl code that is encountered in other tutorials and Turing documentation.

Before performing inference, we might want to inspect the prior that our model places over the data, to see whether there is anything obviously wrong. These kinds of prior predictive checks are straightforward to perform using Turing.jl, since it is possible to sample from the prior easily by just calling the model:

m = putting_model(Float64.(df.distance), df.n)
m().y
19-element Vector{Int64}:
 358
 201
 152
 135
 136
 161
 182
 182
 178
 217
 195
 187
 172
 167
 197
 194
 185
 141
 145

We make use of this to see what kinds of datasets we simulate from the prior:

using Plots

function plot_data(d, n, y, xticks, yticks)
    ylims = (0, round(maximum(n), RoundUp; sigdigits=2))
    margin = -0.5 * Plots.mm
    plt = plot(; xticks=xticks, yticks=yticks, ylims=ylims, margin=margin, grid=false)
    bar!(plt, d, n; color=:red, label="", alpha=0.5)
    bar!(plt, d, y; label="", color=:blue, alpha=0.7)
    return plt
end

# Construct model and run some prior predictive checks.
m = putting_model(Float64.(df.distance), df.n)
hists = map(1:20) do j
    xticks = j > 15 ? :auto : nothing
    yticks = rem(j, 5) == 1 ? :auto : nothing
    return plot_data(df.distance, df.n, m().y, xticks, yticks)
end
plot(hists...; layout=(4, 5))

In this case, the only prior knowledge I have is that the proportion of successful shots ought to decrease monotonically as the distance from the hole increases, which should show up in the data as the blue lines generally go down as we move from left to right on each graph. Unfortunately, there is not a simple way to enforce monotonicity in the samples from a GP, and we can see this in some of the plots above, so we must hope that we have enough data to ensure that this relationship holds approximately under the posterior. In any case, you can judge for yourself whether you think this is the most useful visualisation that we can perform – if you think there is something better to look at, please let us know!

Moving on, we generate samples from the posterior using the default NUTS sampler. We’ll make use of ReverseDiff.jl, as it has better performance than ForwardDiff.jl on this example. See Turing.jl’s docs on Automatic Differentiation for more info.

using Random, ReverseDiff

m_post = m | (y=df.y,)
chn = sample(Xoshiro(123456), m_post, NUTS(;adtype=AutoReverseDiff()), 1_000, progress=false)
┌ Info: Found initial step size
└   ϵ = 0.2
Chains MCMC chain (1000×33×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 69.37 seconds
Compute duration  = 69.37 seconds
parameters        = v, l, f_latent[1], f_latent[2], f_latent[3], f_latent[4], f_latent[5], f_latent[6], f_latent[7], f_latent[8], f_latent[9], f_latent[10], f_latent[11], f_latent[12], f_latent[13], f_latent[14], f_latent[15], f_latent[16], f_latent[17], f_latent[18], f_latent[19]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
    parameters      mean       std      mcse    ess_bulk   ess_tail      rhat  ⋯
        Symbol   Float64   Float64   Float64     Float64    Float64   Float64  ⋯

             v    2.8600    1.3028    0.0417    918.2550   717.5566    1.0001  ⋯
             l    3.5798    0.8554    0.0631    177.0742   219.2957    1.0062  ⋯
   f_latent[1]    2.5512    0.1017    0.0029   1213.8603   401.2582    1.0050  ⋯
   f_latent[2]    1.7058    0.0718    0.0020   1318.8268   732.6694    1.0030  ⋯
   f_latent[3]    0.9728    0.0787    0.0034    553.0865   583.8339    1.0019  ⋯
   f_latent[4]    0.4751    0.0776    0.0032    582.2141   654.6469    1.0009  ⋯
   f_latent[5]    0.1921    0.0745    0.0022   1135.7254   716.0834    1.0005  ⋯
   f_latent[6]   -0.0086    0.0895    0.0044    409.7724   405.8487    0.9992  ⋯
   f_latent[7]   -0.2392    0.0880    0.0033    694.7892   481.4659    0.9991  ⋯
   f_latent[8]   -0.5027    0.0905    0.0033    743.6810   589.3439    0.9990  ⋯
   f_latent[9]   -0.7235    0.0984    0.0042    530.8329   580.1962    1.0004  ⋯
  f_latent[10]   -0.8647    0.0958    0.0035    742.7441   624.4842    1.0043  ⋯
  f_latent[11]   -0.9459    0.0939    0.0028   1092.2843   739.5350    0.9994  ⋯
  f_latent[12]   -1.0314    0.1077    0.0039    761.7264   691.7446    1.0020  ⋯
  f_latent[13]   -1.1816    0.1131    0.0049    528.9246   675.1217    1.0030  ⋯
  f_latent[14]   -1.3997    0.1131    0.0037    954.8557   813.1763    1.0015  ⋯
  f_latent[15]   -1.6072    0.1233    0.0048    698.4308   561.3113    0.9997  ⋯
       ⋮            ⋮         ⋮         ⋮          ⋮          ⋮          ⋮     ⋱
                                                     1 column and 4 rows omitted

Quantiles
    parameters      2.5%     25.0%     50.0%     75.0%     97.5%
        Symbol   Float64   Float64   Float64   Float64   Float64

             v    1.0788    1.8992    2.6231    3.5424    5.9576
             l    2.2506    3.0621    3.4208    3.9185    5.8926
   f_latent[1]    2.3614    2.4826    2.5493    2.6219    2.7460
   f_latent[2]    1.5603    1.6552    1.7069    1.7547    1.8390
   f_latent[3]    0.8258    0.9224    0.9721    1.0268    1.1242
   f_latent[4]    0.3207    0.4197    0.4770    0.5263    0.6302
   f_latent[5]    0.0543    0.1406    0.1907    0.2438    0.3468
   f_latent[6]   -0.1785   -0.0730   -0.0085    0.0479    0.1721
   f_latent[7]   -0.3943   -0.3020   -0.2452   -0.1789   -0.0654
   f_latent[8]   -0.6816   -0.5592   -0.5040   -0.4419   -0.3314
   f_latent[9]   -0.9291   -0.7900   -0.7182   -0.6559   -0.5371
  f_latent[10]   -1.0533   -0.9350   -0.8625   -0.7985   -0.6830
  f_latent[11]   -1.1241   -1.0075   -0.9468   -0.8839   -0.7615
  f_latent[12]   -1.2389   -1.1050   -1.0347   -0.9639   -0.8068
  f_latent[13]   -1.3859   -1.2615   -1.1862   -1.1076   -0.9674
  f_latent[14]   -1.6098   -1.4756   -1.4042   -1.3222   -1.1762
  f_latent[15]   -1.8550   -1.6856   -1.6030   -1.5216   -1.3780
       ⋮            ⋮         ⋮         ⋮         ⋮         ⋮
                                                    4 rows omitted

We can use these samples and the posterior function from AbstractGPs to sample from the posterior probability of success at any distance we choose:

d_pred = 1:0.2:21
samples = map(generated_quantities(m_post, chn)[1:10:end]) do x
    return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4)))
end
p = plot()
plot!(d_pred, reduce(hcat, samples); label="", color=:blue, alpha=0.2)
scatter!(df.distance, df.y ./ df.n; label="", color=:red)

We can see that the general trend is indeed down as the distance from the hole increases, and that if we move away from the data, the posterior uncertainty quickly inflates. This suggests that the model is probably going to do a reasonable job of interpolating between observed data, but less good a job at extrapolating to larger distances.

Back to top