Datalad is basically GitHub for data.
Building on top of Git and git-annex, DataLad allows you to version control arbitrarily large files in datasets, without the need for custom data structures, central infrastructure, or third party services.
- Track changes to your data
- Revert to previous versions
- Capture full provenance records
- Ensure complete reproducibility
Installing Datalad
- The best way to install datalad on HPC systems like cubic is using conda. First, make sure Miniconda is installed in your project folder (see instructions here).
- Then, create an environment for this:
conda create -n dlad python=3.10
conda activate dlad
- Then, install datalad:
conda install -c conda-forge datalad
Note that this page is still under construction and more information may be added at a later date.