Table of Contents

I. Project Information

Abstract

Despite decades of neuroimaging research, how white matter develops along the length of major tracts in humans remains unknown. Here, we identify fundamental patterns of white matter maturation by examining developmental variation along major, long-range cortico-cortical tracts in youth ages 5-23 years using diffusion MRI from three large-scale, cross-sectional datasets (total N = 2,716). Across datasets, we delineate two replicable axes of human white matter development. First, we find a deep-to-superficial axis, in which superficial tract regions near the cortical surface exhibit greater age-related change than deep tract regions. Second, we demonstrate that the development of superficial tract regions aligns with the cortical hierarchy defined by the sensorimotor-association axis, with tract ends adjacent to sensorimotor cortices maturing earlier than those adjacent to association cortices. These results reveal developmental variation along tracts that conventional tract-average analyses have previously obscured, challenging the implicit assumption that white matter tracts mature uniformly along their length. Such developmental variation along tracts may have functional implications, including mitigating ephaptic coupling in densely packed deep tract regions and tuning neural synchrony through hierarchical development in superficial tract regions – ultimately refining neural transmission in youth.

Team

Role Name
Project Lead Audrey C. Luo
Faculty Lead Theodore D. Satterthwaite
Analytic Replicator Steven L. Meisler
Collaborators Valerie J. Sydnor, Aaron Alexander-Bloch, Joëlle Bagautdinova, Deanna M. Barch, Dani S. Bassett, Christos Davatzikos, Alexandre R. Franco, Jeff Goldsmith, Raquel E. Gur, Ruben C. Gur, Fengling Hu, Marc Jaskir, Gregory Kiar, Arielle S. Keller, Bart Larsen, Allyson P. Mackey, Michael P. Milham, David Roalf, Golia Shafiei, Russell T. Shinohara, Leah H. Somerville, Sarah M. Weinstein, Jason Yeatman, Matthew Cieslak, Ariel Rokem

Project Timeline

Project Start Date February 2024
Current Project Status In prep

Code and Communication

Github Repository https://github.com/PennLINC/luo_wm_dev/tree/main
Slack Channel #luo_wm_dev

Datasets

  • RBC PNC (health exclude) as discovery dataset, HCP-D (replication), and HBN (replication)

Conference Presentations

  • Poster presented at The Organization for Human Brain Mapping Annual Meeting, June 2024
  • Poster presented at Flux Congress, September 2024
  • Poster to be presented at The Society of Biological Psychiatry Annual Meeting, April 2025 as a Predoctoral Travel Awardee
  • Talk entitled Axes of Hierarchical Brain Development to be given at the Gradients of Brain Organization Workshop, Brisbane, Australia, June 2025

II. CUBIC Project Directory Structure

The project directory on CUBIC is /cbica/projects/luo_wm_dev.

  • Code for the final manuscript is in /cbica/projects/luo_wm_dev/two_axes/code/
  • The directories on Github are ~/two_axes/code/ and ~/two_axes/software/
Directory Description
~/two_axes/code where manuscript code lives
~/two_axes/input for each dataset: has sample selection files, babs projects, and tract profiles for test subjects
~/two_axes/output for each dataset: tract profiles, GAM outputs, NEST, spin test results
~/two_axes/software singularity images for various softwares and code for installing them and setting up the project folder
   
~/input/ This directory is OUTSIDE of the manuscript directory. For each dataset: includes “raw” data (datalad clones for qsiprep and freesurfer), ‘derivatives’ (pyAFQ output from BABS), sample selection files (demographics csv’s), tract profiles for ALL subjects.

III. Code Documentation

Overview of Analytic Workflow

Step Task Notes
0 Install required software and packages  
1 Construct initial sample This multi-step task creates the subject list for BABS
2 Download QSIPrep and Freesurfer data  
3 Use BABS to run qsirecon and pyAFQ  
4 Get tract profiles from babs datalad get the merged babs output and copy over the tract profiles csv to input/<dataset>/derivatives/tract_profiles_dti
5 Prepare data for final sample selection prep_data: finds subjects that are missing data and saves out a list of them for exclusion. This step needs to be done before final sample selection
6 Construct final sample after prep_data , I did my final sample construction and excluded subjects that failed babs, etc.
7 Harmonize multi-site data using CovBat Only HCP-D and HBN need to be harmonized within each dataset
8 Fit GAMs Compute magnitude of the age effect and age of maturation, as well as GAM fits for each age
9 Tract-to-cortex mapping Lots of steps here! Goal is to get tract-to-cortex probability maps for each dataset and parcellate to HCP-MMP atlas
10 Significance testing with NEST: deep-to-superficial Are age effects enriched in superficial compared to deep tract regions? (For Figures 1, 2 and 3)
11 Significance testing with NEST: tract-to-cortex Age age effects different on each end of callosum motor and IFOF? (For Figures 4 and 5)
12 Significance testing with spin tests: delta-delta analysis and parcel-level age of maturation vs. S-A rank Are the ages of maturation different between tracts with large vs. small differences in S-A rank? (For Figure 6) Collapsing across tracts, does the age of maturation of superficial tract regions correlate with S-A axis rank? (For Figure 7)
13 Visualize results!  

0. Install required software and packages

  • ~/two_axes/software/installation contains code for:
    • setting up the project directory
    • building the required singularity images for fMRIPrep and QSIPrep
    • pulling the Docker container for R packages used on CUBIC
    • information on how to install BABS
    • creating the Python environment used in this project (luo_wm_dev)
    • and cloning the version of pyAFQ used in this project

1. Construct initial sample

  1. Identify non-variant diffusion MRI data in all 3 datasets
    1. Get QC files from each dataset’s QSIPrep clone
      • ~/two_axes/code/datalad/qc/get_qcfiles.sh gets the qc csv’s and copies them to ~/input/${dataset}/raw/qc_files
      • ./submit_get_qcfiles.sh submits the above script as jobs for PNC, HCP-D, and HBN
    2. Make list of subjects with variant and non-variant dMRI data

       cd ~/two_axes/code/sample_construction/identify_variants/ 
       ./run_identify_variants.sh # this scripts runs identify_variants.py 
       # for all 3 datasets and saves out variant and non-variant lists 
       # in the two_axes input folder for each dataset. 
      
  2. Construct initial sample
    1. ${dataset}_InitialSampleSelection.Rmd saves out temporary subject lists, which will be used for BABS

      cd ~/two_axes/code/sample_construction/construct_initial_sample/
      

2. Download QSIPrep and Freesurfer data

  1. Get QSIPrep data for each dataset

     cd ~/two_axes/code/datalad/qsiprep/
     sbatch ./datalad_get_qsiprep_PNC.sh # will get 3 subjects per datasets
     sbatch ./datalad_get_qsiprep_HCPD.sh
     sbatch ./datalad_get_qsiprep_HBN.sh
    
  2. Get Freesurfer data, required for anatomically constrained tractography (ACT/HSVS) in QSIRecon

    1. Note that in the main analyses, only PNC and HCP-D use ACT. HBN loses too many subjects with ACT (noisy dataset), but we ran HBN with ACT as a sensitivity analysis.
     cd ~/two_axes/code/datalad/freesurfer/
     sbatch ./datalad_get_freesurfer_PNC.sh  
     sbatch ./datalad_get_freesurfer_HCPD.sh
    

3. Use BABS to run qsirecon and pyAFQ

  1. Initialize babs project

     cd ~/two_axes/code/run_babs_qsirecon/
     ./babs_init.sh # initializes babs for PNC, HCP-D, and HBN for 3 test subjects 
     # in each dataset 
    

This table shows the custom BABS YAML files and custom QSIPrep json files used for each dataset (all files located in ~/two_axes/code/run_babs_qsirecon/babs_yaml_files/ and ~/two_axes/code/run_babs_qsirecon/qsirecon_json_files)

Dataset QSIRecon YAML file JSON file Notes
PNC MRTrix: ss3t, ACT + HSVS and pyAFQ babs_qsiprep-0-22-0_qsirecon_mrtrix_ACT-hsvs_pyafq_singleshell_dti.yaml mrtrix_singleshell_ss3t_ACT-hsvs_pyafq_dti.json  
HCP-D MRTrix: msmt, ACT + HSVS and pyAFQ babs_qsiprep-0-22-0_qsirecon_mrtrix_ACT-hsvs_pyafq_dti.yaml mrtrix_multishell_msmt_ACT-hsvs_pyafq_dti.json JSON file contains modification: "max_bval": 1500. HCP-D is multi-shell. We use ALL shells for CSD in MRTrix. But we exclude the highest b-value for DTI in pyAFQ.
HBN MRTrix: msmt, and pyAFQ. (No ACT+HSVS) babs_qsiprep-0-22-0_qsirecon_mrtrix_msmt_pyafq_dti.yaml mrtrix_multishell_msmt_pyafq_tractometry_dti.json Same "max_bval": 1500 modification as HCP-D
  1. Copy custom QSIRecon JSON file for each dataset to the corresponding BABS project (described in table above):

     cd ~/two_axes/code/run_babs_qsirecon/
     ./copy_json_files.sh # run code to copy json files to respective BABS project
    
  2. Datalad save all these modifications to each BABS project

     cd ~/two_axes/code/run_babs_qsirecon/
     ./datalad_save_babs_modifications.sh # runs datalad save and push for each dataset's BABS project
    
  3. Do a test job for each dataset’s BABS project (per BABS instructions)

     cd ~/two_axes/code/run_babs_qsirecon/
     ./babs_check_setup.sh # runs babs-check-setup
    
  4. babs-submit subjects for each dataset after the test jobs above finish successfully for all datasets.

     cd ~/two_axes/code/run_babs_qsirecon/
     ./babs_submit.sh # runs babs-submit  
    
    • you can check the status of the jobs by going to the BABS project folder (babs_qsirecon_pyafq_act) and running the following (remember to conda activate babs)

        babs-status --project-root $PWD
      
    • to resubmit failed subjects, go to BABS project folder and do:

        babs-status \
            --project-root $PWD \
            --resubmit failed 
      

4. Get tract profiles from babs project

  1. Consume results!

     cd ~/two_axes/code/run_babs_qsirecon/
     ./babs_merge_consume_results.sh # run code to babs-merge and datalad clone the results for 
     # each dataset
    
  2. Datalad get the tract profiles for each test subject and copy them to a tract_profiles directory in ~/two_axes/input/${dataset}/derivatives

     cd ~/two_axes/code/datalad/pyafq
     sbatch datalad_babs_pyafq.sh
    

5. Create tract profiles from multi-shell diffusion models

  • Reconstruction for NODDI and MAP-MRI were done using QSIPrep and BABS, with the same steps as described above. All code is analogous to the steps run in step 3 and can be found in
      ~/two_axes/code/run_babs_qsirecon/babs_noddi_and_mapmri/
    

The custom BABS YAML files and custom QSIPrep json files used for each multi-shell dataset, HCP-D and HBN (all files are also located in ~/two_axes/code/run_babs_qsirecon/babs_yaml_files/ and ~/two_axes/code/run_babs_qsirecon/qsirecon_json_files).

  • Map NODDI and MAP-MRI measures to tract profiles along pyAFQ-segmented tracts:

      cd ~/two_axes/code/scalars_to_tractprofiles/noddi/
      ./submit_wrapper_noddi_tractprofiles.sh
      cd ~/two_axes/code/scalars_to_tractprofiles/mapmri/
      ./submit_wrapper_mapmri_tractprofiles.sh
    

6. Prepare data for final sample selection

From the QSIRecon/pyAFQ output, find subjects that are missing data and save out a list of them for exclusion. This step needs to be done before final sample selection.

cd ~/two_axes/code/prep_data/
python prep_tract_profiles_data.py

All the following steps are done for NODDI and MAP-MRI in the same way but using code from their respective subfolders, for example:

cd ~/two_axes/code/prep_data/noddi
python noddi_prep_tract_profiles_data.py

7. Construct final sample

  1. ~/two_axes/code/sample_construction/construct_final_sample/${dataset}_FinalSampleSelection.Rmd creates the final sample and tract profiles for each dataset

8. Harmonize multi-site data using CovBat

cd ~/two_axes/code/covbat_harmonization/
./submit_covbat.sh # submit covbat for HCPD and HBN

9. Fit GAMs

cd ~/two_axes/code/fit_GAMs/
./submit_fit_GAMs_development.sh  # fit GAMs for all 3 datasets

10. Tract-to-cortex mapping

  • Code for tract-to-cortex mapping is in ~/two_axes/code/tract_to_cortex.
  • The scripts in the subfolders of ~/two_axes/code/tract_to_cortex/ need to be run sequentially.
  1. The first set of scripts makes the tract density images: ~/two_axes/code/tract_to_cortex/a_make_tdi

    The wrapper will run all scripts a01 to a05 as jobs with dependencies. Each script will automatically run as soon as the one before it finishes. It basically runs job arrays that are dependent on each other (each element of array = 1 subject)

     cd ~/two_axes/code/tract_to_cortex/a_make_tdi
     ./a00_wrapper_make_tdi.sh
    
    • a01_datalad_get_trks.sh : datalad gets the required files: QSIPrep T1w (from each qsiprep clone) and each .trk file for each person in each dataset (from each dataset’s babs project)
    • a02_trk_to_tck.py and a02_trk_tck.sh: converts the trk files to tck and saves the newly converted tck files to a temporary directory
    • a03_delete_trks.sh: cleans up trk’s to save space
    • a04_pyafq-bundles_to_tdi.sh: converts tck files to tract density images for each subject and binarizes the TDI image
    • a05_cleanup.sh: deletes temporary directories and unneeded files to save space
  2. Next, we need to make sure each subject’s Freesurfer surfaces are aligned to their QSIPrep T1w’s and tracts. Scripts found here:

    ~/two_axes/code/tract_to_cortex/b_transforms

    As above, the wrapper in this directory runs all the scripts from b01 to b02.

     cd ~/two_axes/code/tract_to_cortex/b_transforms
     ./b00_wrapper_transforms.sh
    
    • Huge thanks to Marc Jaskir for his code, from which my code is heavily adapted: https://github.com/marcjaskir/tracts/tree/main
    • b01_determine_freesurfer-to-native_acpc_volume_xfm.sh : datalad gets Freesurfer files for each subject. Harmonizes file types and orientations of Freesurfer and QSIPrep images with voxelized tracts.
    • b02_align_surfaces_with_tract_volumes.py and b02_align_surfaces_with_tract_volumes.sh : Pial and white surfaces get transformed to native ACPC (QSIPrep space)
  3. Now we will use nilearn’s vol_to_surf to sample the binarized TDI map for each tract to the cortical surface. The non-zero regions on the surface correspond to the cortical endpoints of that tract.

    Scripts can be found at ~/two_axes/code/tract_to_cortex/c_vol_to_surf

    The wrapper runs c01 and c02.

     cd ~/two_axes/code/tract_to_cortex/c_vol_to_surf
     ./c00_wrapper_vol_to_surf.sh
    
    • c01_vol_to_surf_individual.py and c01_vol_to_surf_individual.sh does the vol_to_surf mapping for each subject and outputs binary maps of cortical endpoints for each tract for each subject. For main analyses, a depth of 1.5mm is used (determined to be optimal depth in discovery dataset for sampling white matter and avoiding gray matter).
    • c02_native_to_fsLR.sh : warps the vol_to_surf outputs to fsLR
  4. Lastly, subject-level binarized tract-to-cortex maps are averaged to get a population probability map for each tract. Then we parcellate each map (1 map per tract per dataset) with Glasser.

    Scripts can be found at ~/two_axes/code/tract_to_cortex/d_group_avg_parcellate

     cd ~/two_axes/code/tract_to_cortex/d_group_avg_parcellate
     ./d00_wrapper_group_avg_parcellate.sh
    
    • d01_group_avg_map_fslr.py : This script takes the binary, subject-level tract-to-cortex maps (in fsLR 32k) for each tract, and averages them across subjects. This produces a group-level tract-to-cortex map of the proportion of subjects that have a tract connecting to each cortical region. Then saves the maps out as giftis (no threshold applied).
    • d02_parcellate_maps.py: This script takes the population probability tract-to-cortex maps and parcellates them to Glasser. Saves out csv’s of the maps (no threshold applied).

11. Significance testing with NEST: deep-to-superficial

  • This step checks — are age effects enriched in superficial compared to deep tract regions? (For Figures 1, 2 and 3). Here, we use network enrichment significance testing (NEST) (https://github.com/smweinst/NEST/tree/main)
    1. First, we need to format the tract profiles data for NEST. ~/two_axes/code/significance_testing/NEST/prep_data_for_NEST.Rmd prepares the data for all 3 datasets. Run through this first. This saves out tract_profiles_for_NEST.RData for each dataset in ~/two_axes/output/${dataset}/tract_profiles/
    2. Run NEST for each tract in each dataset:
	cd ~/two_axes/code/significance_testing/NEST/deep_to_superficial/
	./submit_NEST_wrapper_clipEnds_clip5.sh
  • This wrapper submits a singularity call that uses my docker image for R-packages and the R script for NEST in the same directory (NEST_wrapper_clipEnds_clip5.R). This submits a job array where each element = 1 tract
  • This may take several hours since NEST does 10,000 permutations

12. Significance testing with NEST: tract-to-cortex

  • This step checks — Age age effects different on each end of callosum motor and IFOF? (For Figures 4 and 5)
  • code is set up similarly to step 11
    1. Run NEST for callosum motor and inferior fronto-occipital fasc
cd ~/two_axes/code/significance_testing/NEST/tract_to_cortex/
./submit_NEST_wrapper_end_compare_clip5.sh

13. Significance testing with spin tests: delta-delta analysis and parcel-level age of maturation vs. S-A rank

  • This step checks two things:
    • 1) Are the ages of maturation different between tracts with large vs. small differences in S-A rank? (For Figure 6)
    • 2) Collapsing across tracts, does the age of maturation of superficial tract regions correlate with S-A axis rank? (For Figure 7)
    • The scripts compute a spun t-test (t-value, dof, p-value) and Pearson’s correlations (p-value, correlation) for each dataset and averaged across datasets
cd /cbica/projects/luo_wm_dev/two_axes/code/significance_testing/spin_tests
./submit_main_figures_spintests.sh

14. Visualize results!

  • Main figures: ~/two_axes/code/results/main_figures.Rmd
    • this Rmd uses functions from ~/two_axes/code/results/main_figures_functions.R
  • Supp figures: ~/two_axes/code/results/supp_figures.Rmd
    • this Rmd uses functions from ~/two_axes/code/results/supp_figures_functions.R
  • Plotting glass brains:
    • ~/two_axes/code/results/fig1and3_glass_brain.py
    • ~/two_axes/code/results/fig6_glass_brain.py
    • ~/two_axes/code/results/glass_brain_xfm.sh — necessary transforms for a freesurfer brain to match qsiprep headers for glass brain plotting