Benchmarking Reconstruction Methods for Bundle Segmentation in Single-Shell Diffusion MRI

Acquiring high-quality multi-shell neuroimaging diffusion MRI (dMRI) datasets is time- and resource-intensive. In contrast, there is a massive quantity of dMRI data from legacy research studies and healthcare systems that is typically single-shelled, with a lower angular resolution. While they may offer a valuable, large-scale complement to contemporary research datasets, the reliability of white matter bundles from these scans remains unclear. Here, we leverage a large research dataset where each 64-direction dMRI scan was acquired as two independent 32-direction runs per subject. To investigate how recently developed bundle segmentation methods generalize to this data, we evaluated the test-retest reliability of white matter (WM) bundle extraction across three orientation distribution function (ODF) reconstruction methods: generalized q-sampling imaging (GQI), constrained spherical deconvolution (CSD), and single-shell three-tissue CSD (SS3T). We found that the majority of WM bundles could be reliably extracted from dMRI scans that were acquired using the 32-direction, single-shell acquisition scheme. The mean dice coefficient of reconstructed WM bundles was consistently higher within-subject than between-subject for all WM bundles and reconstruction methods, illustrating high reconstruction reliability. Further, when leveraging features of the reconstructed bundles to predict complex reasoning assessed using a computerized cognitive battery, we observed stable prediction accuracies (r: 0.15-0.36). Among the three reconstruction methods, SS3T had a good balance between sensitivity and specificity, with high intra-class correlation of extracted features, more plausible bundles, and strong predictive performance. More broadly, these results demonstrate that bundle segmentation can achieve robust performance even on lower angular resolution, single-shell dMRI, with particular advantages for ODF methods optimized for single-shell data. This highlights the enormous research potential for dMRI collected in healthcare settings.

I. Project Information

Project Team

Project Lead: Amelie Rauland

Faculty Leads: Matthew Cieslak and Theodore D. Satterthwaite

Analytic Replicator: Steven L. Meisler

Collaborators:
Aaron F. Alexander-Bloch, Joëlle Bagautdinova, Erica B. Baller, Raquel E. Gur, Ruben C. Gur, Audrey C. Luo, Tyler M. Moore, Oleksandr V. Popovych, Kathrin Reetz, David R. Roalf, Russell T. Shinohara, Susan Sotardi, Valerie J. Sydnor, Arastoo Vossough, Simon B. Eickhoff

Project Timeline

Project Start Date: 06/2024
Current Project Status: In preparation

Dataset

Philadelphia Neurodevelopmental Cohort (PNC)

Code and Communication

Github repo: https://github.com/PennLINC/clinical_dmri_benchmark/tree/main
Slack Channel: #clinical_dmri_benchmark

II. CUBIC Project Directory Structure

The project directory on CUBIC is /cbica/projects/clinical_dmri_benchmark. The following directories are relative to this home directory if not specified otherwise.

Directory Description
~/clinical_dmri_benchmark GitHub repo
~/clinical_dmri_benchmark/analysis GitHub repo: Code for data processing, analysis and figure plotting divided into sub-folders
~/clinical_dmri_benchmark/data GitHub repo: data used by several scripts in the repo (e.g. bundle names)
~/clinical_dmri_benchmark/figures GitHub repo: Figures produced by code in the analysis folder
~/clinical_dmri_benchmark/software GitHub repo: Requirement files for python environments
~/images Singularity images for qsiprep and qsirecon
~/results Outputs of data processing and analysis (sub-folders for different steps of the analysis: pre-processing, reconstruction, dice scores, etc.)
~/data Additional data needed for analysis that is too large to be stored in the GitHub repo, can’t be shared publicly, or can be easily downloaded and therefore doesn’t need to be shared through the repo (e.g. atlas bundles, QC files, prediction confounds and targets)
/cbica/comp_space/
clinical_dmri_benchmark/PNC/BIDS
Raw PNC data (downloaded and cloned using datalad)
/cbica/comp_space/
clinical_dmri_benchmark/MNI
Reference T1w MNI image

III. Code Documentation

Software and Data

1 Install required software and setup python environment

1.1 Singularity images for qsiprep and qsirecon

QSIPrep: apptainer build qsiprep-0.21.4.sif docker://pennlinc/qsiprep:0.21.4
QSIRecon: apptainer build qsirecon-0.23.2.sif docker://pennlinc/qsirecon:0.23.2

1.2 Setup clinical_dmri_benchmark python environment

This is the main environment used in this analysis.
Micromamba: micromamba create -n clinical_dmri_benchmark --file ~/software/cdbm_environment.yml
For prediction, the analysis was run on another cluster where it was more straight forward to use a python venv:

curl -LsSf https://astral.sh/uv/install.sh | sh 
uv venv --python 3.12.5 .venvs/clinical_dmri_benchmark 
source .venvs/clinical_dmri_benchmark/bin/activate 
uv pip install -r ~/software/cdbm_environment.txt

1.3 Setup mayavi python environment

This environment was only used to plot the the population maps over the atlas bundles. micromamba create -n mayavi --file /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/software/mayavi_environment.yml

2 Get PNC data in BIDS format

The raw PNC data in BIDS format was stored at /cbica/comp_space/clinical_dmri_benchmark/PNC/BIDS .
Clone the dataset from PMACs as described here. When using datalad get , it is only necessary to get the T1w images (*_T1w.json , *_T1w.nii.gz ) and the diffusion data (*_dwi.bval, *_dwi.bvec, *_dwi.nii.gz, *_dwi.json).

The MNI reference image from QSIPrep was stored at /cbica/comp_space/clinical_dmri_benchmark/PNC/MNI/mni_1mm_t1w_lps_brain.nii.gz. It can be downloaded from here.

Data Processing

3 Pre-process the data using QSIPrep

3.1 Get subject list for pre-processing

micromamba activate clinical_dmri_benchmark
cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing/subject_lists
python3 get_subject_list.py

→ This creates a list of all subjects with two dMRI scans and a T1w scan that have not been pre-processed yet

3.2 Run pre-processing

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing
sbatch PNC_qsiprep.sh

4 Reconstruct WM bundles using three different ODF reconstructions with QSIRecon

4.1 GQI

4.1.1 Get subject list for GQIautotrack reconstruction

micromamba activate clinical_dmri_benchmark
cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing/subject_lists
python3 get_preprocessed_subject_list.py --recon_suffix GQIautotrack

This creates a list of subjects that have been pre-processed and not yet reconstructed using GQI.

4.1.2 Run autotrack based on GQI reconstruction (default)

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing
sbatch PNC_qsirecon_gqi_autotrack.sh

4.2 CSD and SS3T

The reconstruction using CSD and SS3T is run in two steps: First we reconstruct the ODFs using CSD / SS3T in QSIRecon and then we run Autotrack in DSIStudio based on the CSD / SS3T ODFs.

4.2.1 Get subject lists for ODF reconstruction

micromamba activate clinical_dmri_benchmark
cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing/subject_lists
python3 get_preprocessed_subject_list.py --recon_suffix CSD
python3 get_preprocessed_subject_list.py --recon_suffix SS3T

This creates a list of subjects that have been pre-processed and not yet reconstructed using CSD / SS3T.

4.2.2 Run ODF reconstruction

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing
sbatch PNC_qsirecon_csd.sh
sbatch PNC_qsirecon_ss3t.sh

4.2.3 Get subject lists for bundle reconstruction

micromamba activate clinical_dmri_benchmark
cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing/subject_lists
python3 get_preprocessed_subject_list.py --recon_suffix CSDautotrack
python3 get_preprocessed_subject_list.py --recon_suffix SS3Tautotrack

This creates a list of subjects that have been pre-processed and not yet reconstructed using CSDautotrack / SS3Tautotrack.

4.2.4 Run autotrack based on CSD / SS3T reconstruction:

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing
sbatch PNC_qsirecon_csd_autotrack.sh
sbatch PNC_qsirecon_ss3t_autotrack.sh

5 Warp reconstructed bundles to MNI space and mask

5.1 Get subject lists

micromamba activate clinical_dmri_benchmark
cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing/subject_lists
python3 get_reconstructed_subject_list.py --recon_suffix GQIautotrack
python3 get_reconstructed_subject_list.py --recon_suffix CSDautotrack
python3 get_reconstructed_subject_list.py --recon_suffix SS3Tautotrack

This creates a subject list of all subjects that have been reconstructed but not yet warped and masked for each of the reconstruction methods.

5.2 Warp bundles to MNI space and create binary mask

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing
sbatch warp_bundles_to_mni_and_mask.sh GQIautotrack
sbatch warp_bundles_to_mni_and_mask.sh CSDautotrack
sbatch warp_bundles_to_mni_and_mask.sh SS3Tautotrack

6 Create list of excluded subjects

micromamba activate clinical_dmri_benchmark
cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/data_processing/subject_lists
python3 qc.py

This list checks for subjects with acquisition variants, subjects that could not be pre-processed or reconstructed and subjects that failed QC based on Roalf et al., 2016.

Bundle Reliability Analysis

7 Reconstruction Fractions

7.1 Determine fractions of reconstructed bundles

micromamba activate clinical_dmri_benchmark
cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/fractions_reconstructed_bundles
python3 get_reconstructed_bundles.py --recon_suffix GQIautotrack
python3 get_reconstructed_bundles.py --recon_suffix CSDautotrack
python3 get_reconstructed_bundles.py --recon_suffix SS3Tautotrack

7.2 Plot the fractions 🎨

This notebook was run locally and the following files had to be moved to the local setup:

Adjust CSV_ROOT at the top of the script to where you saved the required files and run <GIT_REPO_HOME>/analysis/fractions_reconstructed_bundles/plot_recon_fractions.ipynb to create the plot of reconstruction fractions.

8 Dice Scores

8.1 Calculate dice scores

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/dice_scores
sbatch calculate_dice_scores.sh GQIautotrack
sbatch calculate_dice_scores.sh CSDautotrack
sbatch calculate_dice_scores.sh SS3Tautotrack

8.2 Plot full distributions 🎨

This script was run locally and requires the csv files containing the dice scores between any two scans to be moved to local from /cbica/projects/clinical_dmri_benchmark/results/dices. Adjust DICE_ROOT at the top of the script to where you saved the required files on the local setup. This is also where the high quality versions of the files are saved to as they are too large to be saved in the GitHub repo.

micromamba activate clinical_dmri_benchmark
python3 <GIT_REPO_HOME>/analysis/dice_scores/plot_full_dice_distributions.py

The reconstruction method needs to be set at the top of the script as GQI , CSD or SS3T .

8.3 Plot median distributions 🎨

As above, this notebook was also run locally and requires the files from /cbica/projects/clinical_dmri_benchmark/results/dices. The DICE_ROOT at the top of the notbeook needs to be adjusted to where the files were saved locally. Run <GIT_REPO_HOME>/analysis/dice_scores/plot_median_dice_scores.ipynb to plot the median (median within and between dice score per bundle) distributions for all reconstruction methods.

9 Discriminability

9.1 Calculate discriminability

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/discriminability
sbatch discrim_two_sample_filtered.sh GQIautotrack CSDautotrack SS3Tautotrack
sbatch discrim_two_sample_filtered.sh GQIautotrack SS3Tautotrack CSDautotrack
sbatch discrim_two_sample_filtered.sh CSDautotrack SS3Tautotrack GQIautotrack

Calculation of discriminability and determination of significant differences will be perfromed for the first two arguments, the third argument is passed to perform the filtering and only include scans for which a given bundle was reconstructed for all three methods.

9.2 Plot discriminability 🎨

The notebook was run locally and the following files had to be moved to local to run the code:

Adjust DISCRIM_CSV_ROOT at the top of the script with where you saved the required files on the local setup. This is also where a merged csv will be saved.
Run <GIT_REPO_HOME>/analysis/discriminability/plot_discrim_two_sample.ipynb to create the plot comparing discriminability between reconstruction methods for all reconstructed WM bundles.

10 Bundle Completeness

10.1 Calculate population maps

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/overlay_maps
sbatch calculate_overlay_maps.sh GQIautotrack
sbatch calculate_overlay_maps.sh CSDautotrack
sbatch calculate_overlay_maps.sh SS3Tautotrack

10.2 Extract atlas bundles from DSIStudio

10.3 Mask atlas bundles and warp to MNIc space

First, get T1w images from MNIb and MNIc space from template flow using datalad. These should be saved here:

# Calculate transform using the two T1w images by running
cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/overlap
sbatch calculate_transform_mnib2c.sh # This code is largely based on code by Steven Meisler.
# Mask and transform all atlas bundles
bash mask_and_warp_atlas_bundles.sh

10.4 Plot population maps over atlas bundles 🎨

The code for plotting is largely based on code by Matthew Cieslak.

The script requires it’s own python environment and needs to be run somewhere with visual output! Here, the script was run on a local laptop. The following files are required:

To run the code:

10.5 Calculate sensitivity and specificity of reconstructed bundles with atlas bundles

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/overlap
sbatch sensitivity_specificity.sh GQIautotrack
sbatch sensitivity_specificity.sh CSDautotrack
sbatch sensitivity_specificity.sh SS3Tautotrack

This code is largely based on code by Valerie Sydnor.

10.6 Plot sensitivity and specificity 🎨

The notebook was run locally. The following files need to be copied to local for the code to run:

Adjust OVERLAP_ROOT at the top of the notebook to where you saved the required files locally and run <GIT_REPO_HOME>/analysis/overlap/plot_sensitivity_specificity.ipynb to create the plots of sensitivity and specificity. The overall median plot will be saved in the GitHub repo, the separate plots for each of the bundle will be saved at OVERLAP_ROOT.

Prediction Analysis

CSVs for confounds and targets can be found at /cbica/projects/clinical_dmri_benchmark/data/prediction on CUBIC.

11 Prepare Data

11.1 Prepare feature csvs

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/prediction/prep_prediction_files
micromamba activate clinical_dmri_benchmark
# Move bundle from separate subject folders to common space for easier processing
bash move_bundle_stats.sh GQIautotrack
bash move_bundle_stats.sh CSDautotrack
bash move_bundle_stats.sh SS3Tautotrack
# Create feature CSVs
python3 create_feature_csvs.py

11.2 Calculate and plot feature ICCs 🎨

This notebook was run locally. The following files are required and need to be moved to the local setup:
/cbica/projects/clinical_dmri_benchmark/results/bundle_stats/<reconstruction>autotrack_<run>.csv, with <reconstruction> in [GQI, CSD, SS3T] and <run> in [run-01, run-02].

Adjust the BUNDLE_STATS_ROOT at the top of the notebook to where the files were saved locally and run <GIT_REPO_HOME>/analysis/prediction/plot_feature_icc.ipynb.
Adjust the FEATURE_OF_INTEREST variable to total_volume_mm3 , dti_fa and md to create the plots for all three features.

11.3 Prepare confound csv

cd /cbica/projects/clinical_dmri_benchmark/clinical_dmri_benchmark/analysis/prediction/prep_prediction_files
micromamba activate clinical_dmri_benchmark
python3 prepare_confounds_csv.py

The script extracts the head movement for all scans and creates one confound csv containing all confounds of interest for all subjects.

12 Run Prediction

Prediction was run on a different system. Following files need to be moved to the other cluster to run the prediction:

The prediction performed in the main analysis can be run by submitting <GIT_REPO_HOME>/analysis/prediction/predict_cognition.submit to the cluster.

The results were copied to CUBIC and can be found at /cbica/projects/clinical_dmri_benchmark/results/prediction/remove_confounds_features

13 Compare Prediction Model Performances

This script was run locally and requires the prediction result csvs from /cbica/projects/clinical_dmri_benchmark/results/prediction/remove_confounds_features.
Adjust the RESULT_ROOT at the top of the notebook to where you saved the prediction results on the local setup and run <GIT_REPO_HOME>/analysis/prediction/compare_model_performances.py to obtain a csv with p-values that imply if there is a significant difference between two considered models.
The resulting output csv with the corrected p-values will be saved under the RESULT_ROOT.

14 Plot Prediction Results

14.1 Plot prediction accuracy 🎨

This notbook was run locally and requires the prediction result csvs from /cbica/projects/clinical_dmri_benchmark/results/prediction/remove_confounds_features.
Adjust the RESULT_ROOT at the top of the notebook to where you saved the prediction results on the local setup and run <GIT_REPO_HOME>/analysis/prediction/plot_prediction_results.ipynb to plot the prediction accuracy for the main analysis.

14.2 Plot prediction similarity 🎨

This notbook was run locally and requires the prediction inspector csvs from /cbica/projects/clinical_dmri_benchmark/results/prediction/remove_confounds_features.
Adjust the RESULT_ROOT at the top of the notebook to where you saved the prediction results on the local setup and run <GIT_REPO_HOME>/analysis/prediction/plot_prediction_reliability.ipynb to plot the similarity between prediction from different scans for the main analysis.