Datalad is basically git for data.
Building on top of Git and git-annex, DataLad allows you to version control arbitrarily large files in datasets, without the need for custom data structures, central infrastructure, or third party services.
- Track changes to your data
- Revert to previous versions
- Capture full provenance records
- Ensure complete reproducibility
Installing Datalad
- The best way to install datalad on HPC systems like cubic is using conda. First, make sure Miniforge is installed in your project folder (see instructions here).
- Then, create an environment for this:
conda create -n dlad python=3.11 conda activate dlad
- Then, install datalad:
conda install datalad git git-annex pip install --upgrade datalad # datalad from conda-forge might not be the most update
Note that this page is still under construction and more information may be added at a later date.