Example: Logistic Regression with Random Effects
We will use the Seeds for demonstration. This example concerns the proportion of seeds that germinated on each of 21 plates. Here, we transform the data into a NamedTuple
:
data = (
r = [10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8, 10, 8, 23, 0, 3, 22, 15, 32, 3],
n = [39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 51, 7],
x1 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
x2 = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1],
N = 21,
)
where r[i]
is the number of germinated seeds and n[i]
is the total number of the seeds on the $i$-th plate. Let $p_i$ be the probability of germination on the $i$-th plate. Then, the model is defined by:
\[\begin{aligned} b_i &\sim \text{Normal}(0, \tau) \\ \text{logit}(p_i) &= \alpha_0 + \alpha_1 x_{1 i} + \alpha_2 x_{2i} + \alpha_{12} x_{1i} x_{2i} + b_{i} \\ r_i &\sim \text{Binomial}(p_i, n_i) \end{aligned}\]
where $x_{1i}$ and $x_{2i}$ are the seed type and root extract of the $i$-th plate. The original BUGS program for the model is:
model
{
for( i in 1 : N ) {
r[i] ~ dbin(p[i],n[i])
b[i] ~ dnorm(0.0,tau)
logit(p[i]) <- alpha0 + alpha1 * x1[i] + alpha2 * x2[i] +
alpha12 * x1[i] * x2[i] + b[i]
}
alpha0 ~ dnorm(0.0, 1.0E-6)
alpha1 ~ dnorm(0.0, 1.0E-6)
alpha2 ~ dnorm(0.0, 1.0E-6)
alpha12 ~ dnorm(0.0, 1.0E-6)
tau ~ dgamma(0.001, 0.001)
sigma <- 1 / sqrt(tau)
}
Modeling Language
Writing a Model in BUGS
Language References:
Implementations in C++ and R:
- JAGS and its user manual
- Nimble
Language Syntax:
Writing a Model in Julia
We provide a macro which allows users to write down model definitions using Julia:
model_def = @bugs begin
for i in 1:N
r[i] ~ dbin(p[i], n[i])
b[i] ~ dnorm(0.0, tau)
p[i] = logistic(alpha0 + alpha1 * x1[i] + alpha2 * x2[i] + alpha12 * x1[i] * x2[i] + b[i])
end
alpha0 ~ dnorm(0.0, 1.0E-6)
alpha1 ~ dnorm(0.0, 1.0E-6)
alpha2 ~ dnorm(0.0, 1.0E-6)
alpha12 ~ dnorm(0.0, 1.0E-6)
tau ~ dgamma(0.001, 0.001)
sigma = 1 / sqrt(tau)
end
BUGS syntax carries over almost one-to-one to Julia, with minor exceptions. Modifications required are minor: curly braces are replaced with begin ... end
blocks, and for
loops do not require parentheses. In addition, Julia uses f(x) = ...
as a shorthand for function definition, so BUGS' link function syntax is disallowed. Instead, user can call the inverse function of the link functions on the RHS expressions.
Support for Legacy BUGS Programs
The @bugs
macro also works with original (R-like) BUGS syntax:
model_def = @bugs("""
model{
for( i in 1 : N ) {
r[i] ~ dbin(p[i],n[i])
b[i] ~ dnorm(0.0,tau)
logit(p[i]) <- alpha0 + alpha1 * x1[i] + alpha2 * x2[i] +
alpha12 * x1[i] * x2[i] + b[i]
}
alpha0 ~ dnorm(0.0,1.0E-6)
alpha1 ~ dnorm(0.0,1.0E-6)
alpha2 ~ dnorm(0.0,1.0E-6)
alpha12 ~ dnorm(0.0,1.0E-6)
tau ~ dgamma(0.001,0.001)
sigma <- 1 / sqrt(tau)
}
""", true, true)
By default, @bugs
will translate R-style variable names like a.b.c
to a_b_c
, user can pass false
as the second argument to disable this. User can also pass true
as the third argument if model { }
enclosure is not present in the BUGS program. We still encourage users to write new programs using the Julia-native syntax, because of better debuggability and perks like syntax highlighting.
Basic Workflow
Compilation
Model definition and data are the two necessary inputs for compilation, with optional initializations. The compile function creates a BUGSModel that implements the LogDensityProblems.jl interface.
compile(model_def::Expr, data::NamedTuple)
And with initializations:
compile(model_def::Expr, data::NamedTuple, initializations::NamedTuple)
Using the model definition and data we defined earlier, we can compile the model:
model = compile(model_def, data)
BUGSModel (parameters are in transformed (unconstrained) space, with dimension 26):
Model parameters:
alpha2
b[1], b[2], b[3], b[4], b[5], b[6], b[7], b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15], b[16], b[17], b[18], b[19], b[20], b[21]
tau
alpha12
alpha1
alpha0
Variable sizes and types:
b: size = (21,), type = Vector{Float64}
p: size = (21,), type = Vector{Float64}
n: size = (21,), type = Vector{Int64}
alpha2: type = Float64
sigma: type = Float64
alpha0: type = Float64
alpha12: type = Float64
N: type = Int64
tau: type = Float64
alpha1: type = Float64
r: size = (21,), type = Vector{Int64}
x1: size = (21,), type = Vector{Int64}
x2: size = (21,), type = Vector{Int64}
Parameter values will be sampled from the prior distributions in the original space.
We can provide initializations:
initializations = (alpha = 1, beta = 1)
compile(model_def, data, initializations)
BUGSModel (parameters are in transformed (unconstrained) space, with dimension 26):
Model parameters:
alpha2
b[1], b[2], b[3], b[4], b[5], b[6], b[7], b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15], b[16], b[17], b[18], b[19], b[20], b[21]
tau
alpha12
alpha1
alpha0
Variable sizes and types:
b: size = (21,), type = Vector{Float64}
p: size = (21,), type = Vector{Float64}
n: size = (21,), type = Vector{Int64}
alpha2: type = Float64
sigma: type = Float64
alpha0: type = Float64
alpha12: type = Float64
N: type = Int64
tau: type = Float64
alpha1: type = Float64
r: size = (21,), type = Vector{Int64}
x1: size = (21,), type = Vector{Int64}
x2: size = (21,), type = Vector{Int64}
We can also initialize parameters after compilation:
initialize!(model, initializations)
BUGSModel (parameters are in transformed (unconstrained) space, with dimension 26):
Model parameters:
alpha2
b[1], b[2], b[3], b[4], b[5], b[6], b[7], b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15], b[16], b[17], b[18], b[19], b[20], b[21]
tau
alpha12
alpha1
alpha0
Variable sizes and types:
b: size = (21,), type = Vector{Float64}
p: size = (21,), type = Vector{Float64}
n: size = (21,), type = Vector{Int64}
alpha2: type = Float64
sigma: type = Float64
alpha0: type = Float64
alpha12: type = Float64
N: type = Int64
tau: type = Float64
alpha1: type = Float64
r: size = (21,), type = Vector{Int64}
x1: size = (21,), type = Vector{Int64}
x2: size = (21,), type = Vector{Int64}
initialize!
also accepts a flat vector. In this case, the vector should have the same length as the number of parameters, but values can be in transformed space:
initialize!(model, rand(26))
BUGSModel (parameters are in transformed (unconstrained) space, with dimension 26):
Model parameters:
alpha2
b[1], b[2], b[3], b[4], b[5], b[6], b[7], b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15], b[16], b[17], b[18], b[19], b[20], b[21]
tau
alpha12
alpha1
alpha0
Variable sizes and types:
b: size = (21,), type = Vector{Float64}
p: size = (21,), type = Vector{Float64}
n: size = (21,), type = Vector{Int64}
alpha2: type = Float64
sigma: type = Float64
alpha0: type = Float64
alpha12: type = Float64
N: type = Int64
tau: type = Float64
r: size = (21,), type = Vector{Int64}
alpha1: type = Float64
x1: size = (21,), type = Vector{Int64}
x2: size = (21,), type = Vector{Int64}
Automatic Differentiation
JuliaBUGS integrates with automatic differentiation (AD) through DifferentiationInterface.jl, enabling gradient-based inference methods like Hamiltonian Monte Carlo (HMC) and No-U-Turn Sampler (NUTS).
Specifying an AD Backend
To compile a model with gradient support, pass the adtype
parameter to compile
:
# Compile with gradient support using ADTypes from ADTypes.jl
using ADTypes
model = compile(model_def, data; adtype=AutoReverseDiff(compile=true))
Available AD backends include:
AutoReverseDiff(compile=true)
- ReverseDiff with tape compilation (recommended for most models)AutoForwardDiff()
- ForwardDiff (efficient for models with few parameters)AutoZygote()
- Zygote (source-to-source AD)AutoEnzyme()
- Enzyme (experimental, high-performance)AutoMooncake()
- Mooncake (high-performance reverse-mode AD)
For fine-grained control, you can configure the AD backend:
# ReverseDiff without compilation
model = compile(model_def, data; adtype=AutoReverseDiff(compile=false))
The compiled model with gradient support implements the LogDensityProblems.jl
interface, including logdensity_and_gradient
, which returns both the log density and its gradient.
Inference
For gradient-based inference, we use AdvancedHMC.jl
with models compiled with an adtype
:
using AdvancedHMC, AbstractMCMC, LogDensityProblems, MCMCChains, ReverseDiff
# Compile with gradient support
model = compile(model_def, data; adtype=AutoReverseDiff(compile=true))
n_samples, n_adapts = 2000, 1000
D = LogDensityProblems.dimension(model); initial_θ = rand(D)
samples_and_stats = AbstractMCMC.sample(
model,
NUTS(0.8),
n_samples;
chain_type = Chains,
n_adapts = n_adapts,
init_params = initial_θ,
discard_initial = n_adapts
)
describe(samples_and_stats)
This will return the MCMC Chain,
Chains MCMC chain (2000×40×1 Array{Real, 3}):
Iterations = 1001:1:3000
Number of chains = 1
Samples per chain = 2000
parameters = tau, alpha12, alpha2, alpha1, alpha0, b[1], b[2], b[3], b[4], b[5], b[6], b[7], b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15], b[16], b[17], b[18], b[19], b[20], b[21], sigma
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, is_adapt
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Real Float64 Float64 Missing
tau 73.1490 193.8441 43.2582 56.3430 20.6688 1.0155 missing
alpha12 -0.8052 0.4392 0.0158 761.2180 1049.1664 1.0020 missing
alpha2 1.3428 0.2813 0.0140 422.8810 1013.2570 1.0061 missing
alpha1 0.0845 0.3126 0.0113 773.2202 981.8487 1.0051 missing
alpha0 -0.5480 0.1944 0.0087 537.6212 1156.2083 1.0014 missing
b[1] -0.1905 0.2540 0.0129 374.3372 971.7526 1.0034 missing
b[2] 0.0161 0.2178 0.0056 1505.6353 1002.8787 1.0001 missing
b[3] -0.1986 0.2375 0.0128 367.6766 1287.8215 1.0015 missing
b[4] 0.2792 0.2498 0.0163 201.1558 1168.7538 1.0068 missing
b[5] 0.1170 0.2397 0.0092 659.5422 1484.8584 1.0016 missing
b[6] 0.0667 0.2821 0.0074 1745.5567 902.1014 1.0067 missing
b[7] 0.0597 0.2218 0.0055 1589.5590 1145.6017 1.0065 missing
b[8] 0.1769 0.2316 0.0102 554.5974 1318.8089 1.0001 missing
b[9] -0.1257 0.2233 0.0073 930.0346 1186.4283 1.0031 missing
b[10] -0.2513 0.2392 0.0159 213.6323 1142.4487 1.0096 missing
b[11] 0.0768 0.2783 0.0081 1376.5999 1218.1537 1.0009 missing
b[12] 0.1171 0.2768 0.0079 1354.9409 1130.8217 1.0052 missing
b[13] -0.0688 0.2433 0.0055 1895.0387 1527.7066 1.0010 missing
b[14] -0.1363 0.2558 0.0075 1276.0992 1208.8587 1.0001 missing
b[15] 0.2334 0.2757 0.0135 439.2241 837.3396 1.0036 missing
b[16] -0.1212 0.3024 0.0106 1093.4416 914.9457 0.9997 missing
b[17] -0.2120 0.3142 0.0166 360.6420 702.4098 1.0009 missing
b[18] 0.0346 0.2282 0.0056 1665.0325 1281.7179 1.0011 missing
b[19] -0.0244 0.2400 0.0052 2186.7638 1179.6971 1.0132 missing
b[20] 0.2108 0.2421 0.0131 349.7657 1263.5781 1.0016 missing
b[21] -0.0509 0.2813 0.0061 2200.5614 916.6256 0.9998 missing
sigma 0.2797 0.1362 0.0168 56.3430 21.4971 1.0123 missing
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
tau 3.1280 7.4608 13.0338 28.2289 929.6520
alpha12 -1.6645 -1.0887 -0.7952 -0.5635 0.1162
alpha2 0.8398 1.1494 1.3233 1.5337 1.9177
alpha1 -0.5796 -0.1059 0.1042 0.2883 0.6702
alpha0 -0.9340 -0.6751 -0.5463 -0.4086 -0.1752
b[1] -0.7430 -0.3415 -0.1566 -0.0074 0.2535
b[2] -0.4261 -0.1083 0.0192 0.1420 0.4810
b[3] -0.7394 -0.3377 -0.1687 -0.0242 0.2041
b[4] -0.1108 0.0873 0.2409 0.4375 0.8267
b[5] -0.3141 -0.0458 0.0900 0.2563 0.6489
b[6] -0.4679 -0.0896 0.0291 0.2202 0.7060
b[7] -0.3861 -0.0685 0.0534 0.1847 0.5207
b[8] -0.2326 0.0221 0.1505 0.3162 0.6861
b[9] -0.6007 -0.2482 -0.0984 0.0057 0.2771
b[10] -0.7936 -0.4108 -0.2255 -0.0617 0.1290
b[11] -0.4381 -0.0796 0.0353 0.2178 0.7232
b[12] -0.3806 -0.0451 0.0750 0.2671 0.7625
b[13] -0.5841 -0.2135 -0.0443 0.0652 0.4055
b[14] -0.6854 -0.2872 -0.1015 0.0147 0.3476
b[15] -0.2054 0.0257 0.1898 0.4004 0.8660
b[16] -0.8173 -0.2829 -0.0804 0.0532 0.4094
b[17] -0.9071 -0.3911 -0.1595 0.0099 0.2864
b[18] -0.4526 -0.0919 0.0140 0.1686 0.4985
b[19] -0.5055 -0.1547 -0.0091 0.1134 0.4528
b[20] -0.2120 0.0318 0.1788 0.3673 0.7416
b[21] -0.6482 -0.2044 -0.0263 0.1051 0.5246
sigma 0.0328 0.1882 0.2770 0.3661 0.5654
This is consistent with the result in the OpenBUGS seeds example.
Parallel and Distributed Sampling with AbstractMCMC
AbstractMCMC
and AdvancedHMC
support both parallel and distributed sampling.
Parallel Sampling
To perform multi-threaded sampling of multiple chains, start the Julia session with the -t <n_threads>
argument. The model compilation code remains the same, and we can sample multiple chains in parallel as follows:
n_chains = 4
samples_and_stats = AbstractMCMC.sample(
model,
AdvancedHMC.NUTS(0.65),
AbstractMCMC.MCMCThreads(),
n_samples,
n_chains;
chain_type = Chains,
n_adapts = n_adapts,
init_params = [initial_θ for _ = 1:n_chains],
discard_initial = n_adapts,
)
In this case, we pass two additional arguments to AbstractMCMC.sample
:
AbstractMCMC.MCMCThreads()
: the sampler type, andn_chains
: the number of chains to sample.
Distributed Sampling
To perform distributed sampling of multiple chains, start the Julia session with the -p <n_processes>
argument.
In distributed mode, ensure that all functions and modules are available on all processes. Use @everywhere
to make the functions and modules available on all processes.
For example:
@everywhere begin
using JuliaBUGS, LogDensityProblems, AbstractMCMC, AdvancedHMC, MCMCChains, ADTypes, ReverseDiff
# Define the functions to use
# Use `@bugs_primitive` to register the functions to use in the model
# Distributed can handle data dependencies in some cases, for more detail, see https://docs.julialang.org/en/v1/manual/distributed-computing/
end
n_chains = nprocs() - 1 # use all the processes except the parent process
samples_and_stats = AbstractMCMC.sample(
model,
AdvancedHMC.NUTS(0.65),
AbstractMCMC.MCMCDistributed(),
n_samples,
n_chains;
chain_type = Chains,
n_adapts = n_adapts,
init_params = [initial_θ for _ = 1:n_chains], # each chain has its own initial parameters
discard_initial = n_adapts,
progress = false, # Base.TTY creating problems in distributed setting
)
In this case, we pass two additional arguments to AbstractMCMC.sample
:
AbstractMCMC.MCMCDistributed()
: the sampler type, andn_chains
: the number of chains to sample. Note that theinit_params
argument is now a vector of initial parameters for each chain. Sometimes the progress logger can cause problems in distributed setting, so we can disable it by settingprogress = false
.
Choosing an Automatic Differentiation Backend
JuliaBUGS integrates with multiple automatic differentiation (AD) backends through DifferentiationInterface.jl, providing flexibility to choose the most suitable backend for your model.
Available Backends
The following AD backends are supported via ADTypes.jl:
AutoReverseDiff(compile=true)
(recommended) — Tape-based reverse-mode AD, highly efficient for models with many parameters. Uses compilation by default for optimal performance.AutoForwardDiff()
— Forward-mode AD, efficient for models with few parameters (typically < 20).AutoZygote()
— Source-to-source reverse-mode AD, general-purpose but may be slower than ReverseDiff for many models.AutoEnzyme()
— Experimental high-performance AD backend with LLVM-level transformations.AutoMooncake()
— High-performance reverse-mode AD with advanced optimizations.
Usage Examples
Basic Usage
Specify an AD backend using ADTypes:
using ADTypes
# ReverseDiff with tape compilation (recommended for most models)
model = compile(model_def, data; adtype=AutoReverseDiff(compile=true))
# ForwardDiff (good for small models with few parameters)
model = compile(model_def, data; adtype=AutoForwardDiff())
# Zygote (source-to-source AD)
model = compile(model_def, data; adtype=AutoZygote())
Advanced Configuration
For fine-grained control, you can configure the AD backends:
using ADTypes
# ReverseDiff without tape compilation
model = compile(model_def, data; adtype=AutoReverseDiff(compile=false))
# ReverseDiff with compilation (default, recommended)
model = compile(model_def, data; adtype=AutoReverseDiff(compile=true))
Performance Considerations
ReverseDiff with compilation (
AutoReverseDiff(compile=true)
) is recommended for most models, especially those with many parameters. Compilation adds a one-time overhead but significantly speeds up subsequent gradient evaluations.ForwardDiff (
AutoForwardDiff()
) is often faster for models with few parameters (< 20), as it avoids tape construction overhead.Tape compilation trade-off: While
AutoReverseDiff(compile=true)
has higher initial compilation time, it provides faster gradient evaluations during sampling. For quick prototyping or models that will only be sampled a few times,AutoReverseDiff(compile=false)
may be preferable.
Compiled ReverseDiff tapes cannot handle value-dependent control flow (e.g., if x[1] > 0
). If your model has such control flow, use AutoReverseDiff(compile=false)
or a different backend like AutoForwardDiff()
or AutoMooncake()
. See the ReverseDiff documentation for details.
Compatibility
All models compiled with an adtype
implement the full LogDensityProblems.jl
interface, making them compatible with:
AdvancedHMC.jl
— NUTS and HMC samplers- Any other sampler that works with the LogDensityProblems interface
The gradient computation is prepared during model compilation for optimal performance during sampling.
More Examples
We have transcribed all the examples from the first volume of the BUGS Examples (original and transcribed). All programs and data are included, and can be compiled using the steps described in the tutorial above.