Getting started with Julia for reproducible science

Mathieu Besançon
Polytechnique Montréal, INRIA & Centrale Lille

Re-using / citing the materials:
DOI

Logistics

Slides at https://matbesancon.github.io/slides/JuliaNantes/JuliaTools.html
Repository at https://github.com/matbesancon/JuliaNantesWorkshop

Lots of references & pointers, for the trip home.

Twitter for live-complaints: @matbesancon

Content

  • Reproducibility, why?
  • Getting started with Git
  • Working alone
    • Projects & environments
    • Tests
    • Publishing code
    • Working with data
  • Collaborating on code
    • General workflow
    • Demo & homework

Bonus:

- Unaesthetic diagrams
- Latest research in linear algebra
- Homework

Reproducibility: why should I care?

  • Industrial software:
    • Written once, used often, in (almost) all contexts
    • Bugs are found (eventually) and fixed
  • Academic software:
    • Re-written a lot, used one final time
    • Used in a static, long-lasting document (paper)
    • Tested for one application? On how many data sets?
  • First person reproducing results: you in a couple months
  • Tools should buy a peace of mind, not additional burden
  • Increasing expectations on reproducible software
    • NeurIPS reproducibility checklist for all papers
    • Requirements on reproducibility: not just by you, not just on your machine

5 levels of reproducibility defined in P. Vandewalle, J. Kovacevic and M. Vetterli, "Reproducible research in signal processing"
DOI: 10.1109/MSP.2009.932122

Demo 1: a git project

New project

Add files

Commit

See versions

Working alone

  • One project, isolated from the rest
  • Build a library or produce results
$ tree
.
├── code
├── data
│   ├── data1.csv
│   └── data2.txt
├── results
│   ├── results.csv
│   └── img
│
└── paper
    └── paper.tex

Why not a script?

  • Upgrading dependencies -> did I break my code?
  • Am I sure this code can be run anywhere?
  • Article review after 6 months -> is it still fine?

Working in environments

Safely depending on libraries

Pkg tools: files

Project.toml

  • This directory is a Julia Project
  • Shows what I need
  • Necessary for all projects

Manifest.toml

  • Generated when activating the project
  • Shows how it was run
  • Useful for debugging and research

Freeze the Manifest $\Rightarrow$ freeze how it's run

Demo 2: Pkg tools

Generate project IdentityMatrices

Add dependencies

See Project.toml, Manifest.toml

All is in the Pkg documentation, go read it.

Project isolation

Launch Julia and activate your project:

$ julia --project=@.

Launch, and then activate:

julia> ]
(v1.1) pkg> activate .

Get a project and set the required environment

julia> ]
(v1.1) pkg> activate .
(JuliaNantes) pkg> instantiate

If Manifest.toml provided $\Rightarrow$ same exact configuration as when the code was written.
Otherwise $\Rightarrow$ compatible configuration with Project.toml, creates a Manifest file.

Tests, the easy way

Research software moves fast, and breaks things.

Tests:

  • Specify expected behavior
  • Communicate usage
  • Signal robustness
  • Safeguard against your future self
  • Put yourself in the user's shoes

Demo 3: writing tests

First tests for IdentityMatrices

Write code for IdentityMatrices

Test-specific dependencies

Personal tips

Cover corner cases:

  • @test_throws with expected error
  • What happens with limit values?

    @test_throws MethodError mean(["hello"])
    @test isnan(mean(Float64[]))
    
  • Avoid too special structure in tests Example: input always integer.

Avoid trivial "comfort" tests.
Example: copying a function implementation to test it:

@test mean(x) == sum(x) / length(x)

Unit VS Property test

  • Unit: test a given evaluation / data point
  • Property tests: test a property of the result for given input

Examples: positivity, idempotency, existence for any input, order conservation, ...

Two steps:

  1. generate random input
  2. test property

Publishing code

Why?

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the com-plete set of instructions which genemted the figures.

Buckheit and Donoho, 1995, WaveLab and Reproducible Research

Showcase your work, reference it in your paper.

Better something out now than some day a "perfect" library.

How?

A standard?

Parts of the Julia community moving towards CITATION.bib

What about data?

  • Do not track massive data sets with git
  • Then how?


Source: Lyndon White, DataDeps.jl, JuliaCon2018 https://doi.org/10.6084/m9.figshare.6949145.v1

DataDeps.jl

  • Describe once how to get the data, parse preprocess
  • Data gets cached, no 2nd download if available

Go check it: https://github.com/oxinabox/DataDeps.jl

Collaborating on code and research

Why?

  1. Using projects and getting a valuable experience to report
  2. Use-case not covered (yet)
  3. Somebody noticed your work online and wants to help
  4. Great opportunity for unexpected research projects

Demo 4: Contributing somewhere

Find a project to contribute to

  • github.com/JuliaStats/Distributions.jl
  • github.com/JuliaGraphs/LightGraphs.jl
  • github.com/JuliaOpt/MathOptInterface.jl
  • github.com/JuliaOpt/JuMP.jl

Fork

Develop locally

Commit & push to fork

Pull request

Automate the burden: Continuous Integration

Travis (Continuous Integration)
codecov (Code coverage)

Continuous Integration

This works... on my machine.
What if I could check it on a clean computer, without my setup?

Checking for every change (Pull Request)

Code coverage

How much of the package behaviour did I test? (at least once)?

Take-away

  • Version control is the foundation for lots of modern tools
  • Tests make your life easier
  • Sharing code increases visibility, creates research opportunities

Homework

  • Reproduce all the demos
  • Use git on your research projects
  • Contribute to a package (demo 4)
  • Publish your first package

Reading more

Mathieu Tanneau's tutorial on coding for research: https://github.com/mtanneau/tutorial_airo
Jane Herriman, How to get started with Julia 1.0's package manager: https://www.youtube.com/watch?v=76KL8aSz0Sg
Read the documentation https://docs.julialang.org/en/v1/stdlib/Pkg/index.html