Design

There are two aspects to the design of VarNamedTuples: property access, and indexing. VarNamedTuple consists of recursively nested NamedTuples and PartialArrays which together support both kinds of access.

Property access

Let's first talk about the NamedTuple part. In a VarNamedTuple each level of a Property optic corresponds to a level of nested NamedTuples, with the Symbols of the lenses as keys. For instance, the VarNamedTuple mapping @varname(x) => 1, @varname(y.z) => 2 would be stored as

VarNamedTuple(; x=1, y=VarNamedTuple(; z=2))

where VarNamedTuple(; x=a, y=b) is just a thin wrapper around the NamedTuple (; x=a, y=b). In fact, if vnt is a VarNamedTuple, then vnt.data is exactly that underlying NamedTuple.

It's often handy to think of this as a tree, with each node being a VarNamedTuple, like so:

   VNT
x /   \ y
 1     VNT
         \ z
          2

By virtue of these being NamedTuple, all variable access is completely type-stable.

If all VarNames consisted of only Propertyes we would be done designing the data structure. Sadly, that isn't the case!

Indexing and `PartialArray`s

Indexing is much more complicated to handle than property access, for a number of reasons.

Let's start by talking about the most obvious issue: only some parts of an array may be set. For example, the user may set x[1] and x[10] but not the ones in the middle. This is accomplished using a PartialArray, which is a wrapper around an AbstractArray that holds the data of interest, together with a mask saying which elements have been set.

The array type and size of the mask is always the same as the data array (this is done using Base.similar).

Info

Currently this is not enforced in an inner constructor, because that sometimes leads to extra allocations. It would be nice to investigate if this invariant can be enforced.

If pa.mask[i] is true, then pa.data[i] has been set; otherwise, pa.data[i] may contain some value, but it is not valid to index into that part of the array.

Here is an example:

using DynamicPPL.VarNamedTuples: PartialArray

data = randn(3)
mask = similar(data, Bool)
fill!(mask, false)
pa = PartialArray(data, mask)

PartialArray size=(3,) data::Vector{Float64}

The main way of interacting with a PartialArray is to use BangBang.setindex!!. This sets one or more elements of the PartialArray's data, and marks the corresponding elements in the mask as true.

using BangBang: setindex!!

setindex!!(pa, 12.0, 2)
pa.data, pa.mask

([-0.11152640043976948, 12.0, -0.8001887832610406], Bool[0, 1, 0])

When printed, pa shows only the elements that are set:

pa

PartialArray size=(3,) data::Vector{Float64}
└─ (2,) => 12.0

It is invalid to index into the unset elements:

julia> pa[1]ERROR: BoundsError: attempt to access DynamicPPL.VarNamedTuples.PartialArray{Float64, 1, Vector{Float64}, Vector{Bool}} at index [1, Base.Pairs{Symbol, Union{}, Nothing, @NamedTuple{}}()]

but you can access the set elements:

pa[2]

12.0

It follows from this, that because the PartialArray is actually backed by a regular Array, it is constructive. The call to getindex will return the values stored in pa.data, as long as all the elements in pa.mask are set. For example:

setindex!!(pa, 11.0, 1)
pa[1:2]

2-element Vector{Float64}:
 11.0
 12.0

You can even get the entirety of the array, once you have set all three elements. In this case, the PartialArray is (morally) equivalent to a single Array.

setindex!!(pa, 13.0, 3)
pa[:]

3-element Vector{Float64}:
 11.0
 12.0
 13.0

All Index optics in VarNames correspond to PartialArrays in VarNamedTuples. For example, let's say we want to store the mappings @varname(x[1].a) => 1.0, and y.b[2,3] => 2.0. The corresponding VarNamedTuple would look like this:

         VNT
        /   \y  
      x/     \    b
      /     VNT------PA[[ _, _, _  ],
     /                  [ _, _, 2.0]]
    PA [VNT]
         |
        a|
         |
        1.0

where _ indicates masked elements in a PartialArray. To demonstrate:

using DynamicPPL

vnt = VarNamedTuple()
vnt = setindex!!(vnt, 1.0, @varname(x[1].a))
vnt = setindex!!(vnt, 2.0, @varname(y.b[2, 3]))

VarNamedTuple
├─ x => PartialArray size=(1,) data::DynamicPPL.VarNamedTuples.GrowableArray{VarNamedTuple{(:a,), Tuple{Float64}}, 1}
│       └─ (1,) => VarNamedTuple
│                  └─ a => 1.0
└─ y => VarNamedTuple
        └─ b => PartialArray size=(2, 3) data::DynamicPPL.VarNamedTuples.GrowableArray{Float64, 2}
                └─ (2, 3) => 2.0

vnt.data.x  # This is a PartialArray

PartialArray size=(1,) data::DynamicPPL.VarNamedTuples.GrowableArray{VarNamedTuple{(:a,), Tuple{Float64}}, 1}
└─ (1,) => VarNamedTuple
           └─ a => 1.0

vnt.data.x.data[1]   # This is a VNT

VarNamedTuple
└─ a => 1.0

vnt.data.x.data[1].data.a  # This is 1.0

1.0

vnt.data.y  # This is a VNT

VarNamedTuple
└─ b => PartialArray size=(2, 3) data::DynamicPPL.VarNamedTuples.GrowableArray{Float64, 2}
        └─ (2, 3) => 2.0

vnt.data.y.data.b  # This is a PartialArray

PartialArray size=(2, 3) data::DynamicPPL.VarNamedTuples.GrowableArray{Float64, 2}
└─ (2, 3) => 2.0

vnt.data.y.data.b.data[2, 3]  # This is 2.0

2.0

This illustrates the fundamental structure of VarNamedTuples. From this example, one can see that getting data from a VarNamedTuple is as type-stable as possible:

All property accesses are NamedTuple accesses, which are type-stable.
All indexing is done into PartialArrays: as long as indexing into the underlying data is type-stable (i.e., the element type of the data array is concrete), indexing into the PartialArray is type-stable as well.

Info

In fact, the underlying data need not all have the same (concrete) type: all that is needed is that the unmasked elements have the same (concrete) type. The function _concretise_eltype!! is an attempt to force this to be the case: if the element type is abstract, but all the set elements have the same concrete type, the entire data array's element type will be changed to that concrete type (with junk in the unset elements).

One immediate question here is: how do we know what kind of array the PartialArray should use for its data and mask?

`GrowableArray`s

It's not obvious in the code above, but in the example above, we are implicitly making an assumption based on the indices that we see in the VarNames. For example, for @varname(x[1].a), based on the index 1 we assume that x should be a vector with a length of at least 1. Similarly, for @varname(y.b[2,3]), we assume that y.b should be a matrix with at least 2 rows and 3 columns.

We can inspect this by looking into the PartialArrays:

vnt.data.x.data

1-element DynamicPPL.VarNamedTuples.GrowableArray{VarNamedTuple{(:a,), Tuple{Float64}}, 1}:
 VarNamedTuple(a = 1.0,)

vnt.data.y.data.b.data

2×3 DynamicPPL.VarNamedTuples.GrowableArray{Float64, 2}:
 5.0e-324  0.0  6.90678e-310
 5.0e-324  0.0  2.0

So, these PartialArrays are backed by something called GrowableArray. A GrowableArray is an array type, defined in DynamicPPL, that can grow in size as needed when setindex!! is called with indices outside of its current bounds (with other arrays that would error). The reason for such an array type is that you may want to do something like

begin
    local vnt = VarNamedTuple()
    for i in 1:5
        vnt = setindex!!(vnt, i, @varname(x[i]))
    end
    vnt
end

VarNamedTuple
└─ x => PartialArray size=(5,) data::DynamicPPL.VarNamedTuples.GrowableArray{Int64, 1}
        ├─ (1,) => 1
        ├─ (2,) => 2
        ├─ (3,) => 3
        ├─ (4,) => 4
        └─ (5,) => 5

and we don't have the ability to know in advance that x will eventually need to be at least of size 5. So, every call to setindex!! here will cause the underlying GrowableArray to grow in size as needed.

The problem with this is that it makes a huge number of implicit assumptions about what kind of array x actually is, and consequently it forbids a huge number of indexing operations in Julia.

These include, for example, linear indexing. In this example, the first call will create a GrowableArray with one dimension (i.e., a vector); and the second call will fail since we can't index a vector with two indices.

julia> vnt = setindex!!(VarNamedTuple(), 10.0, @varname(x[1]))┌ Warning: Creating a growable `Base.Array` of dimension 1 to store values. This may not match the actual type or size of the actual `AbstractArray` that will be used inside the DynamicPPL model.
│ 
│  If this is not the type or size that you expect, please see: https://turinglang.org/docs/uri/growablearray
└ @ DynamicPPL.VarNamedTuples ~/work/DynamicPPL.jl/DynamicPPL.jl/src/varnamedtuple/partial_array.jl:797
VarNamedTuple
└─ x => PartialArray size=(1,) data::DynamicPPL.VarNamedTuples.GrowableArray{Float64, 1}
        └─ (1,) => 10.0
julia> vnt = setindex!!(vnt, 20.0, @varname(x[2, 2]))ERROR: ArgumentError: Cannot expand a GrowableArray with 1 dimensions using an index with 2 dimensions. GrowableArrays are created when no template is provided when setting VarNames with indices, e.g. `@varname(x[1])`, which causes DynamicPPL to have no knowledge of the shape of `x`. To fix this, you should provide a template for `x` when creating the VarNamedTuple. Alternatively, if `x` is an array with `N` dimensions, you should always index into it with `N` indices. This means avoiding using, for example, a mixture of linear and Cartesian indexing for arrays with more than one dimension.

Colons also don't work.

julia> vnt = setindex!!(VarNamedTuple(), randn(2), @varname(x[:]))ERROR: ArgumentError: Attempted to set a value with an index of Colon(), but no template was provided for the array. For such an index, a template is needed to determine the shape and type of the array so that the indexed data can be stored correctly.

Other things like OffsetArrays don't work (because x[11] would create a length-11 GrowableArray, which might not be appropriate; and x[-1] will just straight-up error). DimArrays will also fail if you use an index that isn't an integer.

Finally, we don't know how large the final size of the array should be. If you try to access the entire array after setting some elements, it will work, but it will warn you:

julia> vnt = setindex!!(VarNamedTuple(), 10.0, @varname(x[1]))┌ Warning: Creating a growable `Base.Array` of dimension 1 to store values. This may not match the actual type or size of the actual `AbstractArray` that will be used inside the DynamicPPL model.
│ 
│  If this is not the type or size that you expect, please see: https://turinglang.org/docs/uri/growablearray
└ @ DynamicPPL.VarNamedTuples ~/work/DynamicPPL.jl/DynamicPPL.jl/src/varnamedtuple/partial_array.jl:797
VarNamedTuple
└─ x => PartialArray size=(1,) data::DynamicPPL.VarNamedTuples.GrowableArray{Float64, 1}
        └─ (1,) => 10.0
julia> vnt[@varname(x)]┌ Warning: Returning a `Base.Array` with a presumed size based on the indices used to set values; but this may not be the actual type or size of the actual `AbstractArray` that was inside the DynamicPPL model. You should inspect the returned result to make sure that it has the correct value.
│ 
│ To find out how to avoid this warning, please see: https://turinglang.org/docs/uri/growablearray
└ @ DynamicPPL.VarNamedTuples ~/work/DynamicPPL.jl/DynamicPPL.jl/src/varnamedtuple/partial_array.jl:786
1-element Vector{Float64}:
 10.0

Templated arrays

The general solution to this problem is for the user to provide a template for the array x in advance, so that we know what kind of array to create when we see @varname(x[...]).

At a low level, this is done using the DynamicPPL.templated_setindex!! function, which takes an extra argument that specifies the shape of the top-level symbol in the VarName.

For example, the linear-indexing example above now works if you tell the function that x is a 2-by-2 matrix.

using DynamicPPL: templated_setindex!!

x = zeros(2, 2)
vnt = VarNamedTuple()
vnt = templated_setindex!!(VarNamedTuple(), 10.0, @varname(x[1]), x)
vnt = setindex!!(vnt, 20.0, @varname(x[2, 2]))

VarNamedTuple
└─ x => PartialArray size=(2, 2) data::Matrix{Float64}
        ├─ (1, 1) => 10.0
        └─ (2, 2) => 20.0

(The second call can also be templated_setindex!!, but it isn't necessary since the first call already establishes the shape of x, and indeed in such a case the template will be ignored.) Notice that the PartialArray is now backed by the correct Array:

vnt.data.x.data

2×2 Matrix{Float64}:
 10.0        6.90671e-310
  2.0e-322  20.0

It is no longer growable, so if you try to set an out-of-bounds index, it will error:

julia> vnt = setindex!!(vnt, 30.0, @varname(x[3, 1]))ERROR: BoundsError: attempt to access 2×2 Matrix{Float64} at index [3, 1]

This mechanism makes it far more flexible to work with arrays in DynamicPPL models. Fundamentally, this resolves the inconsistency between indexing semantics in the model, and indexing semantics inside the VarNamedTuple.

For example, you can use DimArrays:

import DimensionalData as DD

x = DD.DimArray(zeros(2, 3), (DD.X, DD.Y))

vnt = VarNamedTuple()
vnt = templated_setindex!!(vnt, 1.0, @varname(x[DD.X(1), DD.Y(2)]), x)

VarNamedTuple
└─ x => PartialArray size=(2, 3) data::DimensionalData.DimMatrix{Float64, Tuple{DimensionalData.Dimensions.X{DimensionalData.Dimensions.Lookups.NoLookup{Base.OneTo{Int64}}}, DimensionalData.Dimensions.Y{DimensionalData.Dimensions.Lookups.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Matrix{Float64}, DimensionalData.NoName, DimensionalData.Dimensions.Lookups.NoMetadata}
        └─ (1, 2) => 1.0

vnt.data.x.data

┌ 2×3 DimArray{Float64, 2} ┐
├──────────────────── dims ┤
  ↓ X, → Y
└──────────────────────────┘
 1.9811e-310  1.0          1.9811e-310
 1.9811e-310  1.9743e-310  4.597e-320

You can access the data back again in any way you like, for example using linear indexing here:

getindex(vnt, @varname(x[3]))

1.0

Info

Note that support for such arrays is contingent on the provider of the array type, as well as BangBang.jl. There may be bugs that prevent some array types from fully working correctly. For example, BangBang.setindex!! does not accept keyword arguments, which precludes the use of keyword indices in DimArrays. However, DynamicPPL itself does not inherently prevent you from using such arrays. We would definitely like to fix upstream issues like these, but we don't always have the time to do so: help is very greatly appreciated!

How do we provide the template?

At this point, it would seem like a major faff for users to have to provide templates any time they wanted to use a VarNamedTuple. The good news is, within a DynamicPPL model, the template always exists. For example, consider:

using Distributions
@model function bad_index()
    return x[1] ~ Normal()
end

If you were to attempt to execute this model, even without any interaction with VarNamedTuples, this would error, because there is no array x to set the first element in. It would be like writing a function that does x[1] = 10 without ever defining y:

julia> function bad_index_not_model()
           return x[1] = 10
       endbad_index_not_model (generic function with 1 method)
julia> bad_index_not_model()10

The model can only work if x is already provided, e.g. as an argument or a variable inside the model:

@model function ok_index1(x::AbstractArray)
    return x[1] ~ Normal()
end

@model function ok_index2()
    x = Vector{Float64}(undef, 10)
    return x[1] ~ Normal()
end

In both cases, we do have access to the template for x. Thus, this is just a matter of plumbing this information through to DynamicPPL.tilde_assume!! so that it can be used when setting values in the VarNamedTuple. In the macro output below, you can see that x is passed as one of the arguments to tilde_assume!!.

@macroexpand @model function bad_index()
    return x[1] ~ Normal()
end

quote
    function bad_index(__model__::Model, __varinfo__::AbstractVarInfo; )
        #= design.md:331 =#
        begin
            #= design.md:331 =#
            #= design.md:332 =#
            begin
                #= /home/runner/work/DynamicPPL.jl/DynamicPPL.jl/src/compiler.jl:614 =#
                var"##retval#477" = begin
                        var"##dist#474" = Normal()
                        var"##vn#471" = (DynamicPPL.resolve_varnames)((VarName){:x}((Index)((1,), (;), (Iden)())), var"##dist#474")
                        var"##isassumption#472" = begin
                                if (DynamicPPL.contextual_isassumption)(__model__.context, (DynamicPPL.prefix)(__model__.context, var"##vn#471"))
                                    if !((DynamicPPL.inargnames)(var"##vn#471", __model__)) || (DynamicPPL.inmissings)(var"##vn#471", __model__)
                                        true
                                    else
                                        (Base.maybeview)(x, 1) === missing
                                    end
                                else
                                    false
                                end
                            end
                        if (DynamicPPL.contextual_isfixed)(__model__.context, (DynamicPPL.prefix)(__model__.context, var"##vn#471")) && !(var"##dist#474" isa DynamicPPL.Submodel)
                            x = (Accessors.set)(x, (AbstractPPL.with_mutation)((AbstractPPL.getoptic)(var"##vn#471")), (DynamicPPL.getfixed_nested)(__model__.context, (DynamicPPL.prefix)(__model__.context, var"##vn#471")))
                        elseif var"##isassumption#472"
                            begin
                                (var"##value#476", __varinfo__) = (DynamicPPL.tilde_assume!!)(__model__.context, (DynamicPPL.unwrap_right_vn)((DynamicPPL.check_tilde_rhs)(var"##dist#474"), var"##vn#471")..., x, __varinfo__)
                                x = (Accessors.set)(x, (AbstractPPL.with_mutation)((AbstractPPL.getoptic)(var"##vn#471")), var"##value#476")
                                var"##value#476"
                            end
                        else
                            var"##supplied_val#475" = if (DynamicPPL.inargnames)(var"##vn#471", __model__)
                                    (Base.maybeview)(x, 1)
                                else
                                    (DynamicPPL.getconditioned_nested)(__model__.context, (DynamicPPL.prefix)(__model__.context, var"##vn#471"))
                                end
                            (var"##value#473", __varinfo__) = (DynamicPPL.tilde_observe!!)(__model__.context, (DynamicPPL.check_tilde_rhs)(var"##dist#474"), var"##supplied_val#475", var"##vn#471", __varinfo__)
                            x = (Accessors.set)(x, (AbstractPPL.with_mutation)((AbstractPPL.getoptic)(var"##vn#471")), var"##value#473")
                            var"##value#473"
                        end
                    end
                #= /home/runner/work/DynamicPPL.jl/DynamicPPL.jl/src/compiler.jl:615 =#
                return (var"##retval#477", __varinfo__)
            end
        end
    end
    begin
        $(Expr(:meta, :doc))
        function bad_index(; )
            #= design.md:331 =#
            return (Model){false}(bad_index, NamedTuple{()}(()); )
        end
    end
end

What this means is that in the core use case of VarNamedTuple (i.e., for storing random variables in DynamicPPL models), templates will always be provided. There are, for the most part, only two places where templates are unavailable, and we have to fall back on the GrowableArray approach:

Loading data from chains.
Providing conditioned or fixed values.

The first of these can be fixed by rerunning the model once to pick up the templates. The second cannot be truly fixed, but the DynamicPPL.@vnt macro allows users to manually provide templates themselves in cases where they are really needed.

Design

Property access

Indexing and PartialArrays

GrowableArrays

Templated arrays

How do we provide the template?

Indexing and `PartialArray`s

`GrowableArray`s