Reproducibility Guide

I. Project Information
II. CUBIC Project Directory Structure
III. Code Documentation

I. Project Information

Abstract

Despite decades of neuroimaging research, how white matter develops along the length of major tracts in humans remains unknown. Here, we identify fundamental patterns of white matter maturation by examining developmental variation along major, long-range cortico-cortical tracts in youth ages 5-23 years using diffusion MRI from three large-scale, cross-sectional datasets (total N = 2,716). Across datasets, we delineate two replicable axes of human white matter development. First, we find a deep-to-superficial axis, in which superficial tract regions near the cortical surface exhibit greater age-related change than deep tract regions. Second, we demonstrate that the development of superficial tract regions aligns with the cortical hierarchy defined by the sensorimotor-association axis, with tract ends adjacent to sensorimotor cortices maturing earlier than those adjacent to association cortices. These results reveal developmental variation along tracts that conventional tract-average analyses have previously obscured, challenging the implicit assumption that white matter tracts mature uniformly along their length. Such developmental variation along tracts may have functional implications, including mitigating ephaptic coupling in densely packed deep tract regions and tuning neural synchrony through hierarchical development in superficial tract regions – ultimately refining neural transmission in youth.

Team

Role	Name
Project Lead	Audrey C. Luo
Faculty Lead	Theodore D. Satterthwaite
Analytic Replicator	Steven L. Meisler
Collaborators	Valerie J. Sydnor, Aaron Alexander-Bloch, Joëlle Bagautdinova, Deanna M. Barch, Dani S. Bassett, Christos Davatzikos, Alexandre R. Franco, Jeff Goldsmith, Raquel E. Gur, Ruben C. Gur, Fengling Hu, Marc Jaskir, Gregory Kiar, Arielle S. Keller, Bart Larsen, Allyson P. Mackey, Michael P. Milham, David Roalf, Golia Shafiei, Russell T. Shinohara, Leah H. Somerville, Sarah M. Weinstein, Jason Yeatman, Matthew Cieslak, Ariel Rokem

Project Timeline

Project Start Date	February 2024
Current Project Status	In prep

Code and Communication

Github Repository	https://github.com/PennLINC/luo_wm_dev/tree/main
Slack Channel	#luo_wm_dev

Datasets

RBC PNC (health exclude) as discovery dataset, HCP-D (replication), and HBN (replication)

Conference Presentations

Poster presented at The Organization for Human Brain Mapping Annual Meeting, June 2024
Poster presented at Flux Congress, September 2024
Poster to be presented at The Society of Biological Psychiatry Annual Meeting, April 2025 as a Predoctoral Travel Awardee
Talk entitled Axes of Hierarchical Brain Development to be given at the Gradients of Brain Organization Workshop, Brisbane, Australia, June 2025

II. CUBIC Project Directory Structure

The project directory on CUBIC is /cbica/projects/luo_wm_dev.

Code for the final manuscript is in /cbica/projects/luo_wm_dev/two_axes/code/
The directories on Github are ~/two_axes/code/ and ~/two_axes/software/

Directory	Description
`~/two_axes/code`	where manuscript code lives
`~/two_axes/input`	for each dataset: has sample selection files, babs projects, and tract profiles for test subjects
`~/two_axes/output`	for each dataset: tract profiles, GAM outputs, NEST, spin test results
`~/two_axes/software`	singularity images for various softwares and code for installing them and setting up the project folder

`~/input/`	This directory is OUTSIDE of the manuscript directory. For each dataset: includes “raw” data (datalad clones for qsiprep and freesurfer), ‘derivatives’ (pyAFQ output from BABS), sample selection files (demographics csv’s), tract profiles for ALL subjects.

III. Code Documentation

Overview of Analytic Workflow

Step	Task	Notes
0	Install required software and packages
1	Construct initial sample	This multi-step task creates the subject list for BABS
2	Download QSIPrep and Freesurfer data
3	Use BABS to run `qsirecon` and `pyAFQ`
4	Get tract profiles from `babs`	`datalad` get the merged babs output and copy over the tract profiles csv to `input/<dataset>/derivatives/tract_profiles_dti`
5	Prepare data for final sample selection	`prep_data`: finds subjects that are missing data and saves out a list of them for exclusion. This step needs to be done before final sample selection
6	Construct final sample	after `prep_data` , I did my final sample construction and excluded subjects that failed babs, etc.
7	Harmonize multi-site data using CovBat	Only HCP-D and HBN need to be harmonized within each dataset
8	Fit GAMs	Compute magnitude of the age effect and age of maturation, as well as GAM fits for each age
9	Tract-to-cortex mapping	Lots of steps here! Goal is to get tract-to-cortex probability maps for each dataset and parcellate to HCP-MMP atlas
10	Significance testing with NEST: deep-to-superficial	Are age effects enriched in superficial compared to deep tract regions? (For Figures 1, 2 and 3)
11	Significance testing with NEST: tract-to-cortex	Age age effects different on each end of callosum motor and IFOF? (For Figures 4 and 5)
12	Significance testing with spin tests: delta-delta analysis and parcel-level age of maturation vs. S-A rank	Are the ages of maturation different between tracts with large vs. small differences in S-A rank? (For Figure 6) Collapsing across tracts, does the age of maturation of superficial tract regions correlate with S-A axis rank? (For Figure 7)
13	Visualize results!

0. Install required software and packages

~/two_axes/software/installation contains code for:
- setting up the project directory
- building the required singularity images for fMRIPrep and QSIPrep
- pulling the Docker container for R packages used on CUBIC
- information on how to install BABS
- creating the Python environment used in this project (luo_wm_dev)
- and cloning the version of pyAFQ used in this project

1. Construct initial sample

Identify non-variant diffusion MRI data in all 3 datasets
1. Get QC files from each dataset’s QSIPrep clone
  - ~/two_axes/code/datalad/qc/get_qcfiles.sh gets the qc csv’s and copies them to ~/input/${dataset}/raw/qc_files
  - ./submit_get_qcfiles.sh submits the above script as jobs for PNC, HCP-D, and HBN
2. Make list of subjects with variant and non-variant dMRI data
```
 cd ~/two_axes/code/sample_construction/identify_variants/ 
 ./run_identify_variants.sh # this scripts runs identify_variants.py 
 # for all 3 datasets and saves out variant and non-variant lists 
 # in the two_axes input folder for each dataset. 
```
Construct initial sample
1. ${dataset}_InitialSampleSelection.Rmd saves out temporary subject lists, which will be used for BABS
```
cd ~/two_axes/code/sample_construction/construct_initial_sample/
```

2. Download QSIPrep and Freesurfer data

Get QSIPrep data for each dataset

 cd ~/two_axes/code/datalad/qsiprep/
 sbatch ./datalad_get_qsiprep_PNC.sh # will get 3 subjects per datasets
 sbatch ./datalad_get_qsiprep_HCPD.sh
 sbatch ./datalad_get_qsiprep_HBN.sh

Get Freesurfer data, required for anatomically constrained tractography (ACT/HSVS) in QSIRecon
1. Note that in the main analyses, only PNC and HCP-D use ACT. HBN loses too many subjects with ACT (noisy dataset), but we ran HBN with ACT as a sensitivity analysis.
```
 cd ~/two_axes/code/datalad/freesurfer/
 sbatch ./datalad_get_freesurfer_PNC.sh  
 sbatch ./datalad_get_freesurfer_HCPD.sh
```

3. Use BABS to run `qsirecon` and `pyAFQ`

Initialize babs project

 cd ~/two_axes/code/run_babs_qsirecon/
 ./babs_init.sh # initializes babs for PNC, HCP-D, and HBN for 3 test subjects 
 # in each dataset 

This table shows the custom BABS YAML files and custom QSIPrep json files used for each dataset (all files located in ~/two_axes/code/run_babs_qsirecon/babs_yaml_files/ and ~/two_axes/code/run_babs_qsirecon/qsirecon_json_files)

Dataset	QSIRecon	YAML file	JSON file	Notes
PNC	MRTrix: ss3t, ACT + HSVS and pyAFQ	babs_qsiprep-0-22-0_qsirecon_mrtrix_ACT-hsvs_pyafq_singleshell_dti.yaml	mrtrix_singleshell_ss3t_ACT-hsvs_pyafq_dti.json
HCP-D	MRTrix: msmt, ACT + HSVS and pyAFQ	babs_qsiprep-0-22-0_qsirecon_mrtrix_ACT-hsvs_pyafq_dti.yaml	mrtrix_multishell_msmt_ACT-hsvs_pyafq_dti.json	JSON file contains modification: `"max_bval": 1500`. HCP-D is multi-shell. We use ALL shells for CSD in MRTrix. But we exclude the highest b-value for DTI in pyAFQ.
HBN	MRTrix: msmt, and pyAFQ. (No ACT+HSVS)	babs_qsiprep-0-22-0_qsirecon_mrtrix_msmt_pyafq_dti.yaml	mrtrix_multishell_msmt_pyafq_tractometry_dti.json	Same `"max_bval": 1500` modification as HCP-D

Copy custom QSIRecon JSON file for each dataset to the corresponding BABS project (described in table above):

 cd ~/two_axes/code/run_babs_qsirecon/
 ./copy_json_files.sh # run code to copy json files to respective BABS project

Datalad save all these modifications to each BABS project

 cd ~/two_axes/code/run_babs_qsirecon/
 ./datalad_save_babs_modifications.sh # runs datalad save and push for each dataset's BABS project

Do a test job for each dataset’s BABS project (per BABS instructions)

 cd ~/two_axes/code/run_babs_qsirecon/
 ./babs_check_setup.sh # runs babs-check-setup

babs-submit subjects for each dataset after the test jobs above finish successfully for all datasets.
```
 cd ~/two_axes/code/run_babs_qsirecon/
 ./babs_submit.sh # runs babs-submit  
```
- you can check the status of the jobs by going to the BABS project folder (babs_qsirecon_pyafq_act) and running the following (remember to conda activate babs)
```
  babs-status --project-root $PWD
```
- to resubmit failed subjects, go to BABS project folder and do:
```
  babs-status \
      --project-root $PWD \
      --resubmit failed 
```

4. Get tract profiles from `babs` project

Consume results!

 cd ~/two_axes/code/run_babs_qsirecon/
 ./babs_merge_consume_results.sh # run code to babs-merge and datalad clone the results for 
 # each dataset

Datalad get the tract profiles for each test subject and copy them to a tract_profiles directory in ~/two_axes/input/${dataset}/derivatives
```
 cd ~/two_axes/code/datalad/pyafq
 sbatch datalad_babs_pyafq.sh
```

5. Create tract profiles from multi-shell diffusion models

Reconstruction for NODDI and MAP-MRI were done using QSIPrep and BABS, with the same steps as described above. All code is analogous to the steps run in step 3 and can be found in
```
  ~/two_axes/code/run_babs_qsirecon/babs_noddi_and_mapmri/
```

The custom BABS YAML files and custom QSIPrep json files used for each multi-shell dataset, HCP-D and HBN (all files are also located in ~/two_axes/code/run_babs_qsirecon/babs_yaml_files/ and ~/two_axes/code/run_babs_qsirecon/qsirecon_json_files).

Map NODDI and MAP-MRI measures to tract profiles along pyAFQ-segmented tracts:

  cd ~/two_axes/code/scalars_to_tractprofiles/noddi/
  ./submit_wrapper_noddi_tractprofiles.sh
  cd ~/two_axes/code/scalars_to_tractprofiles/mapmri/
  ./submit_wrapper_mapmri_tractprofiles.sh

6. Prepare data for final sample selection

From the QSIRecon/pyAFQ output, find subjects that are missing data and save out a list of them for exclusion. This step needs to be done before final sample selection.

cd ~/two_axes/code/prep_data/
python prep_tract_profiles_data.py

All the following steps are done for NODDI and MAP-MRI in the same way but using code from their respective subfolders, for example:

cd ~/two_axes/code/prep_data/noddi
python noddi_prep_tract_profiles_data.py

7. Construct final sample

~/two_axes/code/sample_construction/construct_final_sample/${dataset}_FinalSampleSelection.Rmd creates the final sample and tract profiles for each dataset

8. Harmonize multi-site data using CovBat

cd ~/two_axes/code/covbat_harmonization/
./submit_covbat.sh # submit covbat for HCPD and HBN

9. Fit GAMs

cd ~/two_axes/code/fit_GAMs/
./submit_fit_GAMs_development.sh  # fit GAMs for all 3 datasets

10. Tract-to-cortex mapping

Code for tract-to-cortex mapping is in ~/two_axes/code/tract_to_cortex.
The scripts in the subfolders of ~/two_axes/code/tract_to_cortex/ need to be run sequentially.

The first set of scripts makes the tract density images: ~/two_axes/code/tract_to_cortex/a_make_tdi

The wrapper will run all scripts a01 to a05 as jobs with dependencies. Each script will automatically run as soon as the one before it finishes. It basically runs job arrays that are dependent on each other (each element of array = 1 subject)
```
 cd ~/two_axes/code/tract_to_cortex/a_make_tdi
 ./a00_wrapper_make_tdi.sh
```
- a01_datalad_get_trks.sh : datalad gets the required files: QSIPrep T1w (from each qsiprep clone) and each .trk file for each person in each dataset (from each dataset’s babs project)
- a02_trk_to_tck.py and a02_trk_tck.sh: converts the trk files to tck and saves the newly converted tck files to a temporary directory
- a03_delete_trks.sh: cleans up trk’s to save space
- a04_pyafq-bundles_to_tdi.sh: converts tck files to tract density images for each subject and binarizes the TDI image
- a05_cleanup.sh: deletes temporary directories and unneeded files to save space
Next, we need to make sure each subject’s Freesurfer surfaces are aligned to their QSIPrep T1w’s and tracts. Scripts found here:

~/two_axes/code/tract_to_cortex/b_transforms

As above, the wrapper in this directory runs all the scripts from b01 to b02.
```
 cd ~/two_axes/code/tract_to_cortex/b_transforms
 ./b00_wrapper_transforms.sh
```
- Huge thanks to Marc Jaskir for his code, from which my code is heavily adapted: https://github.com/marcjaskir/tracts/tree/main
- b01_determine_freesurfer-to-native_acpc_volume_xfm.sh : datalad gets Freesurfer files for each subject. Harmonizes file types and orientations of Freesurfer and QSIPrep images with voxelized tracts.
- b02_align_surfaces_with_tract_volumes.py and b02_align_surfaces_with_tract_volumes.sh : Pial and white surfaces get transformed to native ACPC (QSIPrep space)
Now we will use nilearn’s vol_to_surf to sample the binarized TDI map for each tract to the cortical surface. The non-zero regions on the surface correspond to the cortical endpoints of that tract.

Scripts can be found at ~/two_axes/code/tract_to_cortex/c_vol_to_surf

The wrapper runs c01 and c02.
```
 cd ~/two_axes/code/tract_to_cortex/c_vol_to_surf
 ./c00_wrapper_vol_to_surf.sh
```
- c01_vol_to_surf_individual.py and c01_vol_to_surf_individual.sh does the vol_to_surf mapping for each subject and outputs binary maps of cortical endpoints for each tract for each subject. For main analyses, a depth of 1.5mm is used (determined to be optimal depth in discovery dataset for sampling white matter and avoiding gray matter).
- c02_native_to_fsLR.sh : warps the vol_to_surf outputs to fsLR
Lastly, subject-level binarized tract-to-cortex maps are averaged to get a population probability map for each tract. Then we parcellate each map (1 map per tract per dataset) with Glasser.

Scripts can be found at ~/two_axes/code/tract_to_cortex/d_group_avg_parcellate
```
 cd ~/two_axes/code/tract_to_cortex/d_group_avg_parcellate
 ./d00_wrapper_group_avg_parcellate.sh
```
- d01_group_avg_map_fslr.py : This script takes the binary, subject-level tract-to-cortex maps (in fsLR 32k) for each tract, and averages them across subjects. This produces a group-level tract-to-cortex map of the proportion of subjects that have a tract connecting to each cortical region. Then saves the maps out as giftis (no threshold applied).
- d02_parcellate_maps.py: This script takes the population probability tract-to-cortex maps and parcellates them to Glasser. Saves out csv’s of the maps (no threshold applied).

11. Significance testing with NEST: deep-to-superficial

This step checks — are age effects enriched in superficial compared to deep tract regions? (For Figures 1, 2 and 3). Here, we use network enrichment significance testing (NEST) (https://github.com/smweinst/NEST/tree/main)
1. First, we need to format the tract profiles data for NEST. ~/two_axes/code/significance_testing/NEST/prep_data_for_NEST.Rmd prepares the data for all 3 datasets. Run through this first. This saves out tract_profiles_for_NEST.RData for each dataset in ~/two_axes/output/${dataset}/tract_profiles/
2. Run NEST for each tract in each dataset:

	cd ~/two_axes/code/significance_testing/NEST/deep_to_superficial/
	./submit_NEST_wrapper_clipEnds_clip5.sh

This wrapper submits a singularity call that uses my docker image for R-packages and the R script for NEST in the same directory (NEST_wrapper_clipEnds_clip5.R). This submits a job array where each element = 1 tract
This may take several hours since NEST does 10,000 permutations

12. Significance testing with NEST: tract-to-cortex

This step checks — Age age effects different on each end of callosum motor and IFOF? (For Figures 4 and 5)
code is set up similarly to step 11
1. Run NEST for callosum motor and inferior fronto-occipital fasc

cd ~/two_axes/code/significance_testing/NEST/tract_to_cortex/
./submit_NEST_wrapper_end_compare_clip5.sh

13. Significance testing with spin tests: delta-delta analysis and parcel-level age of maturation vs. S-A rank

This step checks two things:
- 1) Are the ages of maturation different between tracts with large vs. small differences in S-A rank? (For Figure 6)
- 2) Collapsing across tracts, does the age of maturation of superficial tract regions correlate with S-A axis rank? (For Figure 7)
- The scripts compute a spun t-test (t-value, dof, p-value) and Pearson’s correlations (p-value, correlation) for each dataset and averaged across datasets

cd /cbica/projects/luo_wm_dev/two_axes/code/significance_testing/spin_tests
./submit_main_figures_spintests.sh

14. Visualize results!

Main figures: ~/two_axes/code/results/main_figures.Rmd
- this Rmd uses functions from ~/two_axes/code/results/main_figures_functions.R
Supp figures: ~/two_axes/code/results/supp_figures.Rmd
- this Rmd uses functions from ~/two_axes/code/results/supp_figures_functions.R
Plotting glass brains:
- ~/two_axes/code/results/fig1and3_glass_brain.py
- ~/two_axes/code/results/fig6_glass_brain.py
- ~/two_axes/code/results/glass_brain_xfm.sh — necessary transforms for a freesurfer brain to match qsiprep headers for glass brain plotting

Table of Contents