Bayesian Time Series Analysis

In time series analysis we are often interested in understanding how various real-life circumstances impact our quantity of interest. These can be, for instance, season, day of week, or time of day. To analyse this it is useful to decompose time series into simpler components (corresponding to relevant circumstances) and infer their relevance. In this tutorial we are going to use Turing for time series analysis and learn about useful ways to decompose time series.

Modelling time series

Before we start coding, let us talk about what exactly we mean with time series decomposition. In a nutshell, it is a divide-and-conquer approach where we express a time series as a sum or a product of simpler series. For instance, the time series \(f(t)\) can be decomposed into a sum of \(n\) components

\[f(t) = \sum_{i=1}^n f_i(t),\]

or we can decompose \(g(t)\) into a product of \(m\) components

\[g(t) = \prod_{i=1}^m g_i(t).\]

We refer to this as additive or multiplicative decomposition respectively. This type of decomposition is great since it lets us reason about individual components, which makes encoding prior information and interpreting model predictions very easy. Two common components are trends, which represent the overall change of the time series (often assumed to be linear), and cyclic effects which contribute oscillating effects around the trend. Let us simulate some data with an additive linear trend and oscillating effects.

using Turing
using FillArrays
using StatsPlots

using LinearAlgebra
using Random
using Statistics

Random.seed!(12345)

true_sin_freq = 2
true_sin_amp = 5
true_cos_freq = 7
true_cos_amp = 2.5
tmax = 10
β_true = 2
α_true = -1
tt = 0:0.05:tmax
f₁(t) = α_true + β_true * t
f₂(t) = true_sin_amp * sinpi(2 * t * true_sin_freq / tmax)
f₃(t) = true_cos_amp * cospi(2 * t * true_cos_freq / tmax)
f(t) = f₁(t) + f₂(t) + f₃(t)

plot(f, tt; label="f(t)", title="Observed time series", legend=:topleft, linewidth=3)
plot!(
    [f₁, f₂, f₃],
    tt;
    label=["f₁(t)" "f₂(t)" "f₃(t)"],
    style=[:dot :dash :dashdot],
    linewidth=1,
)

Even though we use simple components, combining them can give rise to fairly complex time series. In this time series, cyclic effects are just added on top of the trend. If we instead multiply the components the cyclic effects cause the series to oscillate between larger and larger values, since they get scaled by the trend.

g(t) = f₁(t) * f₂(t) * f₃(t)

plot(g, tt; label="f(t)", title="Observed time series", legend=:topleft, linewidth=3)
plot!([f₁, f₂, f₃], tt; label=["f₁(t)" "f₂(t)" "f₃(t)"], linewidth=1)

Unlike \(f\), \(g\) oscillates around \(0\) since it is being multiplied with sines and cosines. To let a multiplicative decomposition oscillate around the trend we could define it as \(\tilde{g}(t) = f₁(t) * (1 + f₂(t)) * (1 + f₃(t)),\) but for convenience we will leave it as is. The inference machinery is the same for both cases.

Model fitting

Having discussed time series decomposition, let us fit a model to the time series above and recover the true parameters. Before building our model, we standardise the time axis to \([0, 1]\) and subtract the max of the time series. This helps convergence while maintaining interpretability and the correct scales for the cyclic components.

σ_true = 0.35
t = collect(tt[begin:3:end])
t_min, t_max = extrema(t)
x = (t .- t_min) ./ (t_max - t_min)
yf = f.(t) .+ σ_true .* randn(size(t))
yf_max = maximum(yf)
yf = yf .- yf_max

scatter(x, yf; title="Standardised data", legend=false)

Let us now build our model. We want to assume a linear trend, and cyclic effects. Encoding a linear trend is easy enough, but what about cyclical effects? We will take a scattergun approach, and create multiple cyclical features using both sine and cosine functions and let our inference machinery figure out which to keep. To do this, we define how long a one period should be, and create features in reference to said period. How long a period should be is problem dependent, but as an example let us say it is \(1\) year. If we then find evidence for a cyclic effect with a frequency of 2, that would mean a biannual effect. A frequency of 4 would mean quarterly etc. Since we are using synthetic data, we are simply going to let the period be 1, which is the entire length of the time series.

freqs = 1:10
num_freqs = length(freqs)
period = 1
cyclic_features = [sinpi.(2 .* freqs' .* x ./ period) cospi.(2 .* freqs' .* x ./ period)]

plot_freqs = [1, 3, 5]
freq_ptl = plot(
    cyclic_features[:, plot_freqs];
    label=permutedims(["sin(2π$(f)x)" for f in plot_freqs]),
    title="Cyclical features subset",
)

Having constructed the cyclical features, we can finally build our model. The model we will implement looks like this

\[ f(t) = \alpha + \beta_t t + \sum_{i=1}^F \beta_{\sin{},i} \sin{}(2\pi f_i t) + \sum_{i=1}^F \beta_{\cos{},i} \cos{}(2\pi f_i t), \]

with a Gaussian likelihood \(y \sim \mathcal{N}(f(t), \sigma^2)\). For convenience we are treating the cyclical feature weights \(\beta_{\sin{},i}\) and \(\beta_{\cos{},i}\) the same in code and weight them with \(\beta_c\). And just because it is so easy, we parameterise our model with the operation with which to apply the cyclic effects. This lets us use the exact same code for both additive and multiplicative models. Finally, we plot prior predictive samples to make sure our priors make sense.

@model function decomp_model(t, c, op)
    α ~ Normal(0, 10)
    βt ~ Normal(0, 2)
    βc ~ MvNormal(Zeros(size(c, 2)), I)
    σ ~ truncated(Normal(0, 0.1); lower=0)

    cyclic = c * βc
    trend = α .+ βt .* t
    μ = op(trend, cyclic)
    y ~ MvNormal(μ, σ^2 * I)
    return (; trend, cyclic)
end

y_prior_samples = mapreduce(hcat, 1:100) do _
    rand(decomp_model(t, cyclic_features, +)).y
end
plot(t, y_prior_samples; linewidth=1, alpha=0.5, color=1, label="", title="Prior samples")
scatter!(t, yf; color=2, label="Data")

With the model specified and with a reasonable prior we can now let Turing decompose the time series for us!

function mean_ribbon(samples)
    qs = quantile(samples)
    low = qs[:, Symbol("2.5%")]
    up = qs[:, Symbol("97.5%")]
    m = mean(samples)[:, :mean]
    return m, (m - low, up - m)
end

function get_decomposition(model, x, cyclic_features, chain, op)
    chain_params = Turing.MCMCChains.get_sections(chain, :parameters)
    return generated_quantities(model(x, cyclic_features, op), chain_params)
end

function plot_fit(x, y, decomp, ymax)
    trend = mapreduce(x -> x.trend, hcat, decomp)
    cyclic = mapreduce(x -> x.cyclic, hcat, decomp)

    trend_plt = plot(
        x,
        trend .+ ymax;
        color=1,
        label=nothing,
        alpha=0.2,
        title="Trend",
        xlabel="Time",
        ylabel="f₁(t)",
    )
    ls = [ones(length(t)) t] \ y
    α̂, β̂ = ls[1], ls[2:end]
    plot!(
        trend_plt,
        t,
        α̂ .+ t .* β̂ .+ ymax;
        label="Least squares trend",
        color=5,
        linewidth=4,
    )

    scatter!(trend_plt, x, y .+ ymax; label=nothing, color=2, legend=:topleft)
    cyclic_plt = plot(
        x,
        cyclic;
        color=1,
        label=nothing,
        alpha=0.2,
        title="Cyclic effect",
        xlabel="Time",
        ylabel="f₂(t)",
    )
    return trend_plt, cyclic_plt
end

chain = sample(decomp_model(x, cyclic_features, +) | (; y=yf), NUTS(), 2000, progress=false)
yf_samples = predict(decomp_model(x, cyclic_features, +), chain)
m, conf = mean_ribbon(yf_samples)
predictive_plt = plot(
    t,
    m .+ yf_max;
    ribbon=conf,
    label="Posterior density",
    title="Posterior decomposition",
    xlabel="Time",
    ylabel="f(t)",
)
scatter!(predictive_plt, t, yf .+ yf_max; color=2, label="Data", legend=:topleft)

decomp = get_decomposition(decomp_model, x, cyclic_features, chain, +)
decomposed_plt = plot_fit(t, yf, decomp, yf_max)
plot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))
┌ Info: Found initial step size
└   ϵ = 0.0015625

Inference is successful and the posterior beautifully captures the data. We see that the least squares linear fit deviates somewhat from the posterior trend. Since our model takes cyclic effects into account separately, we get a better estimate of the true overall trend than if we would have just fitted a line. But what frequency content did the model identify?

function plot_cyclic_features(βsin, βcos)
    labels = reshape(["freq = $i" for i in freqs], 1, :)
    colors = collect(freqs)'
    style = reshape([i <= 10 ? :solid : :dash for i in 1:length(labels)], 1, :)
    sin_features_plt = density(
        βsin[:, :, 1];
        title="Sine features posterior",
        label=labels,
        ylabel="Density",
        xlabel="Weight",
        color=colors,
        linestyle=style,
        legend=nothing,
    )
    cos_features_plt = density(
        βcos[:, :, 1];
        title="Cosine features posterior",
        ylabel="Density",
        xlabel="Weight",
        label=nothing,
        color=colors,
        linestyle=style,
    )

    return seasonal_features_plt = plot(
        sin_features_plt,
        cos_features_plt;
        layout=(2, 1),
        size=(800, 600),
        legend=:outerright,
    )
end

βc = Array(group(chain, :βc))
plot_cyclic_features(βc[:, begin:num_freqs, :], βc[:, (num_freqs + 1):end, :])

Plotting the posterior over the cyclic features reveals that the model managed to extract the true frequency content.

Since we wrote our model to accept a combining operator, we can easily run the same analysis for a multiplicative model.

yg = g.(t) .+ σ_true .* randn(size(t))

y_prior_samples = mapreduce(hcat, 1:100) do _
    rand(decomp_model(t, cyclic_features, .*)).y
end
plot(t, y_prior_samples; linewidth=1, alpha=0.5, color=1, label="", title="Prior samples")
scatter!(t, yf; color=2, label="Data")
chain = sample(decomp_model(x, cyclic_features, .*) | (; y=yg), NUTS(), 2000, progress=false)
yg_samples = predict(decomp_model(x, cyclic_features, .*), chain)
m, conf = mean_ribbon(yg_samples)
predictive_plt = plot(
    t,
    m;
    ribbon=conf,
    label="Posterior density",
    title="Posterior decomposition",
    xlabel="Time",
    ylabel="g(t)",
)
scatter!(predictive_plt, t, yg; color=2, label="Data", legend=:topleft)

decomp = get_decomposition(decomp_model, x, cyclic_features, chain, .*)
decomposed_plt = plot_fit(t, yg, decomp, 0)
plot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))
┌ Info: Found initial step size
└   ϵ = 0.0015625

The model fits! What about the infered cyclic components?

βc = Array(group(chain, :βc))
plot_cyclic_features(βc[:, begin:num_freqs, :], βc[:, (num_freqs + 1):end, :])

While multiplicative model fits to the data, it does not recover the true parameters for this dataset.

Wrapping up

In this tutorial we have seen how to implement and fit time series models using additive and multiplicative decomposition. We also saw how to visualise the model fit, and how to interpret learned cyclical components.

Back to top