Datalad is basically git for data.

Building on top of Git and git-annex, DataLad allows you to version control arbitrarily large files in datasets, without the need for custom data structures, central infrastructure, or third party services.

  • Track changes to your data
  • Revert to previous versions
  • Capture full provenance records
  • Ensure complete reproducibility

Installing Datalad

  1. The best way to install datalad on HPC systems like cubic is using conda. First, make sure Miniforge is installed in your project folder (see instructions here).
  2. Then, create an environment for this:
    conda create -n dlad python=3.11
    conda activate dlad
    
  3. Then, install datalad:
    conda install datalad git git-annex
    pip install --upgrade datalad # datalad from conda-forge might not be the most update
    

Note that this page is still under construction and more information may be added at a later date.