Google Summer of Code/Julia Summer of Code
Cameron PfifferFebruary 12, 2020
Last year, Turing participated in the Google Summer of Code (GSoC) through the Julia language organization. It was a fun time, and the project was better for it. Turing plans to participate in the upcoming GSoC, and we wanted to outline some potential projects and expectations we have for applicants.
If you are not aware, Google provides funds to students around the world to develop a project of their choice over the summer. Students receive funds from Google and spend three months on any open source project.
The Turing development team has prepared a list of possible projects that we have deemed valuable to the project and easy enough that it could feasibly be created in the three-month limit. This list is not exclusive -- if you have a good idea, you can write it up in your proposal, though it is recommend that you reach out to any of the Turing team on Julia's Slack (you can get an invite here) or Discourse. Messages on Discourse should be posted to the "Probabilistic programming" category -- we'll find you!
Possible project ideas:
- Benchmarking. Turing's performance has been sporadically benchmarked against various other probabilistic programming languages (e.g. Turing, Stan, PyMC3, TensorFlow Prob), but a systemic approach to studying where Turing excels and where it falls short would be useful. A GSoC student would implement identical models in many PPLs and build tools to benchmark all PPLs against one another.
- Nested sampling integration. Turing focuses on modularity in inference methods, and the development team would like to see more inference methods, particularly the popular nested sampling method. A Julia package (NestedSamplers.jl) but it is not hooked up to Turing and does not currently have a stable API. A GSoC student would either integrate that package or construct their own nested sampling method and build it into Turing.
- Automated function memoization by model annotation. Function memoization is a way to reduce costly function evaluation by caching the output when the same inputs are given. Turing's Gibbs sampler often ends up rerunning expensive functions multiple times, and it would be a significant performance improvement to allow Turing's model compiler to automatically memoize functions where appropriate. A student working on this project would become intimately familiar with Turing's model compiler and build in various automated improvements.
- Making Distributions GPU compatible. Julia's GPU tooling is generally quite good, but currently Turing is not able to reliably use GPUs while sampling because Distributions.jl is not GPU compatible. A student on this project would work with the Turing developers and the Distributions developers to allow the use of GPU parallelism where possible in Turing.
- Static distributions. Small, fixed-size vectors and matrices are fairly common in Turing models. This means that sampling in Turing can probably benefit from using statically sized vectors and matrices from StaticArrays.jl instead of the dynamic normal Julia arrays. Beside the often superior performance of small static vectors and matrices, static arrays are also automatically compatible with the GPU stack in Julia. Currently, the main obstacle to using StaticArrays.jl is that distributions in Distributions.jl are not compatible with StaticArrays. A GSoC student would adapt the multivariate and matrix-variate distributions as well as the univariate distribution with vector parameters in Distributions.jl to make a spin-off package called StaticDistributions.jl. The student would then benchmark StaticDistributions.jl against Distributions.jl and showcase an example of using StaticDistributions.jl together with CuArrays.jl and/or CUDAnative.jl for GPU-acceleration.
- GPnet extensions. One of Turing's satellite packages, GPnet, is designed to provide a comprehensive suite of Gaussian process tools. See this issue for potential tasks -- there's a lot of interesting stuff going on with GPs, and this task in particular may have some creative freedom to it.
- Better chains and model diagnostics. One package that Turing (and many others) rely on heavily is MCMCChains.jl, a package designed to format, store, and analyze parameter samples generated during MCMC inference. MCMCChains is currently showing its age a little and has many bad design choices that need to be fixed. Alternatively, a student could construct a far more lightweight chain system.
- Model comparison tools. Turing and its satellite packages do not currently provide a comprehensive suite of model comparison tools, a critical tool for the applied statistician. A student who worked on this project would implement various model comparison tools like LOO and WAIC, among others.
- MLE/MAP tools. Maximum likelihood estimates (MLE) and maximum a posteriori (MAP) estimates can currently only be done by users through a clunky set of workarounds. A streamlined function like
mle(model)
ormap(model)
would be very useful for many of Turing's users who want to see what the MLE or MAP estimates look like, and it may be valuable to allow for functionality that allows MCMC sampling to begin from the MLE or MAP estimates. Students working on this project will work with optimization packages such as Optim.jl to make MLE and MAP estimation straightforward for Turing models. - Particle sampler improvements. Turing's development team has spent a lot of time and energy to make inference methods more modular, but Turing's particle samplers have not yet been modernized and spun off into a separate package. Two packages that resulted from this were AdvancedHMC for Hamiltonian MCMC methods, and AdvancedMH for Metropolis-Hastings style inference methods. A student who worked on this project would become very familiar with Turing's inference backend and with particle sampling methods. This is a good project for people who love making things efficient and easily extendable.
Other projects are welcome, but we do strongly recommend discussing any potential projects with members of the Turing team, as they will end up mentoring GSoC students for the duration of the project.
We're looking forward to what people are interested in!