Reproducibility Guide
The entire analytic workflow implemented in this project is described in the following sections. This workflow includes data preparation and analyses included in the manuscript. Scripts should be run in the order outlined below.
Table of contents
- I. Project information
- II. Directory structure
- III. Code documentation
- Overview of the analytic workflow
- Project software
- Build the environment
- Publicly available population-level data
- Data preparation
- Analysis
- Data structure (Figure 1)
- Spatial embedding (Figure 2)
- Terms-tracts partial least squares (Figure 3)
- Functional decoding (Figure 4)
- Tract functional diversity (Figure 5)
- Biological cortical similarity (Figure 6)
- Individual-level analysis — age effects (Figure 7)
- Individual-level analysis — cognition effects (Figure 8)
- A note on utilities
Project information
Summary
While long-range white matter (WM) tracts support communication between distant regions, it remains unknown how WM tracts are positioned within the cortical hierarchy to support cognition. Here, we leveraged population-level tract-to-region mapping, task-based functional activations, and measures of cortical biology to delineate how tracts are situated within the cortical hierarchy to support cognition. We found that tracts are differentially positioned along the cortical hierarchy to support specific cognitive functions and serve as bridges connecting distinct biological environments. Their hierarchical placement also reflects developmental variation in tract microstructure and individual differences in cognition. Together, these findings provide a framework that moves beyond conventional tract categories and emphasizes the essential role that WM tracts play in the cortical hierarchy.
Team
| Role | Name |
|---|---|
| Project lead | Joëlle Bagautdinova |
| Faculty lead | Theodore D. Satterthwaite |
| Analytic replicator | Golia Shafiei |
| Collaborators | Audrey C. Luo, Margaret K. Pecsok, Taylor Salo, Aaron F. Alexander-Bloch, Dani S. Bassett, Margaret E. Gardner, Raquel E. Gur, Ruben C. Gur, Allyson P. Mackey, Bratislav Misic, Tyler M. Moore, David R. Roalf, Russell T. Shinohara, Valerie J. Sydnor, Tien T. Tong, Fang-Cheng Yeh, Russell A. Poldrack, Matthew Cieslak |
Timeline
| Project start date | January 2023 |
|---|---|
| Current project status | In prep |
Code and communication
| Github repository | https://github.com/PennLINC/tractmaps |
|---|---|
| Slack channel | #cortical-tractometry |
Datasets
- Population-level tract-to-region mappings from Yeh, 2022
- Population-level meta-analytic task activations from Neurosynth
- Population-level biological cortical properties from Neuromaps and Bigbrain
- Individual-level data from the Philadelphia Neurodevelopmental Cohort (PNC)
Directory structure
Most data and analyses are done locally in the tractmaps directory. The directories on github contain everything but the data/raw folder. The directory is organized as follows:
| Directory | Description |
|---|---|
~/code |
where manuscript code lives |
~/code/python_env |
has code to build the python environment |
~/code/data_prep |
raw data is used by code from /code/data_prep to generate derivatives used in the analyses |
~/code/get_data |
has code to get PNC data |
~/code/utils |
has utility functions used across analyses |
~/code/analysis |
has code to generate all manuscript result figures 1-7 |
~/data/ |
where manuscript data lives |
~/data/raw |
has input data required to conduct analyses |
~/data/derivatives |
has data derivatives that are used in analyses |
Data for individual-level PNC analyses are stored on CUBIC. The project directory on CUBIC is /cbica/projects/tractmaps. The directory is organized as follows:
| Directory | Description |
|---|---|
~/tractmaps/code/get_data |
code to fetch the PNC dMRI data and store in tractmaps cubic project |
~/data/PNC/ |
has PNC dMRI scalar, qc, and demographics data for required in analyses |
Code documentation
Overview of the analytic workflow
| Step | Task |
|---|---|
| 1 | Install the required software and build the environment |
| 2 | Prepare data derivatives; these will be the inputs for analyses |
| 3 | Build the data structure figure (Figure 1) |
| 4 | Examine the relationship between tract spatial positioning and S-A range (Figure 2) |
| 5 | Assess the spatial relationship between tracts and cognitive terms using Partial Least Squares (Figure 3) |
| 6 | Perform functional decoding of WM tracts to get the contribution of each cognitive term in each tract (Figure 4) |
| 7 | Quantify the functional diversity of tracts using the Gini coefficient and evaluate the association with tract S-A range (Figure 5) |
| 8 | Quantify the biological cortical similarity of WM tracts and evaluate the relationship with S-A range and cognitive diversity (Figure 6) |
| 9 | Prepare PNC data for age and cognition analyses |
| 10 | Perform GAMs on PNC data assessing the effect of age and cognition on WM tracts microstructure (fractional anisotropy) |
Project software
The following external software was used in this project:
- Connectome workbench, version 2.0.1 (see details below)
- DSI studio, version Hou “侯” May 27 2025 (see details below)
- Python, version 3.8
- R, version 4.4.2
- QSIPrep, version 1.0.0rc1
- QSIRecon, version 1.1.0
Connectome workbench
Connectome Workbench can be downloaded at https://www.humanconnectome.org/software/get-connectome-workbench. This software is needed for generating region labels and for the neuromaps Python package. This project used version 2.0.1.
Once that’s downloaded, add the path to the bash profile:
echo 'export PATH=$PATH:/Applications/workbench/bin_macosx64' >> ~/.bash_profile)
# note that for newer Macs, add the path is added differently:
echo 'export PATH=$PATH:/Applications/workbench/bin_macosxsub' >> ~/.zshrc
DSI studio
DSI studio can be downloaded at: https://dsi-studio.labsolver.org/download.html
This version was used for tract visualizations on a Mac book pro M4, MacOS 15.7:
DSI Studio version: Hou "侯" May 27 2025
Build the environment
The Python environment used for this project was build using the build_environment.sh script.
Note: make sure conda and homebrew are installed prior to doing this!
cd ~/code/python_env
bash build_environment.sh
Publicly available population-level data
This project uses the following publicly available resources:
- The original HCP-MMP parcellation dlabel.nii from [https://balsa.wustl.edu/78X3
- HCP-MMP region labels and coordinates (full description based on supplemental files of the Glasser et al., 2016 paper) made available at https://neuroimaging-core-docs.readthedocs.io/en/latest/pages/atlases.html#how-to-download-and-install-additional-fsl-atlases
- Tract-to-region probability matrix from Yeh 2022, Nature Methods
- Tract trk files (for tract visualization purposes) from https://github.com/frankyeh/data-atlas/releases/download/hcp1065/hcp1065_avg_tracts_trk.zip listed on this page https://brain.labsolver.org/hcp_trk_atlas.html.
- The HCP-YA 1065 1-mm population-averaged FIB file in the ICBM152 space (also for tract visualization): https://pitt-my.sharepoint.com/:u:/g/personal/yehfc_pitt_edu/EVqdp-lm0_NCgsFkSNDZeP0Bb3PpiSJRGpOKFhvVgZihVQ?e=GhUuOc provided on this page: https://brain.labsolver.org/hcp_template.html
- Tract names and abbreviations :
data/derivatives/tract_names/abbreviations.xlsxtaken and adjusted from https://github.com/data-others/atlas/releases/download/hcp1065/abbreviation2.xlsx - S-A axis ranks, glasser parcellated from https://github.com/PennLINC/S-A_ArchetypalAxis/tree/main/Glasser360_MMP
This data can be downloaded and placed into relevant subfolders in data/raw. Cognitive terms and biological properties are pulled directly using python packages.
Data preparation
First, raw data is used to prepare data derivatives that will be used in the analyses.
➡️ Code path: code/data_prep
-
Generate Glasser labels (
label.giianddlabel.nii): Glasser gifti labels are used for parcellation and visualization. The region labels from the original file were ordered as right (1-180), then left (181-360). They are reindexed inprep_glasser_labels.shto be consistent with other data used in analyses, where left hemisphere regions are indices 1-180, right hemisphere regions are indices 181-360. This script callsremap_labels.sh. Run this as:cd /Users/joelleba/tractmaps/code/data_prep bash prep_glasser_labels.sh - Create the tract-to-region probabilities dataframe: the tract-to-region probabilities and region names are reformatted for analysis in
prep_tract_probabilities.py - Regional coordinates: the x-y-z coordinates of Glasser regions are generated in
prep_glasser_coords.py -
Cognitive terms dataframe: Cognitive maps from neurosynth are obtained using the NiMARE package and parcellated in Glasser regions using
parcellate_neurosynth.py. Note: for this script to run, first create a yaml file containing your email address:# Configuration file for neurosynth data preparation # Email address for downloading abstracts from neurosynth email_address: 'email.address@here.com' - Biological cortical properties dataframe: The cortical maps used in the project are obtained through neuromaps and BigBrain, then parcellated in
parcellate_neuromaps.py - Euclidean distances: Regional pairwise Euclidean distances, as well as tract mean Euclidean distance, are generated in
tract_euc_distance.py - Geodesic distance (sensitivity analysis): Regional Geodesic distances, as well as tract mean geodesic distance, are generated in
tract_geodesic_distance.py. Note that this script takes a little while to run. - Sensorimotor-to-association axis: Tract S-A axis ranges are generated in
prep_sa_axis.py - Tract biological cortical similarity: The cortical similarity of tracts based on neurobiological (neuromaps) properties is generated in
tract_cortical_similarity.py - Regional nulls (spin-based): Spin-based nulls (for Partial Least Squares analysis) are generated in
compute_nulls.py. Note: this will take a little while to run. This will output 10,000 null indices for Glasser regions. - Network rewiring nulls (sensitivity analysis): Tract rewiring nulls are generated in
tract_rewiring_nulls.py. Note: this will take a little while to run. This will output 10,000 dataframes containing tract probabilities for rewired null tracts, saved in a pkl.
These scripts generate all the data contained in the data/derivatives folder. In addition, this folder contains an excel file with tract names and abbreviations called abbreviations.xlsx, which was adapted from https://github.com/data-others/atlas/releases/download/hcp1065/abbreviation2.xlsx available on the DSI studio website at https://brain.labsolver.org/hcp_trk_atlas.html. The data in the data/derivatives folder can be used to run the analyses and generate the manuscript figures below.
Analysis
Data structure (Figure 1)
Plots for Figure 1 showing the input data structure for tract probabilities, cognitive terms, and cortical properties are generated in:
➡️ Path: code/analysis/1_data_structure/
data_structure_plotting.py- plots the tract-to-region, cognitive terms, and biological properties matrices along with a few example maps shown on the cortical surface.tracts_vis_table_1.py- generates tract visualizations in glass brains for Table 1.
Spatial embedding (Figure 2)
The association between the mean Euclidean distance and S-A range of tracts is examined in:
➡️ Path: code/2_spatial_embedding/
plot_tract_distances.py- creates heatmaps of region coordinates and Euclidean distances with tract overlays.plot_example_tract_sa_ranks.py- plots the full S-A axis and the S-A ranks of example tracts on the brain surface.spatial_embedding.py- significance testing of the association between tract mean Euclidean distance and S-A range, and generates the correlation figure.
Sensitivity analyses
spatial_embedding_geodesic.py- uses geodesic distance instead of Euclidean distance.spatial_embedding_tract_subsets.py- analyzes association tracts and projection tracts separately.spatial_embedding_rewiring_null.py- significance testing using tract rewiring (this takes some time).
Terms-tracts partial least squares (Figure 3)
Partial least squares analysis to identify dominant patterns of covariance between cognitive terms and tracts is performed in:
➡️ Path: /code/3_pls/
pls_diagram.py- generates plots with simulated data for the explanatory diagram.pls_terms_tracts.py- performs the PLS analysis. Note that the significance testing and cross-validation performed here takes a while to run.pls_plotting.py- plots the PLS results. The bootstrapping done for the term and tract loadings here takes a little while.
Functional decoding (Figure 4)
The cognitive term contributions are generated for each tract in:
➡️ Path: /code/4_functional_decoding/
Code:
tract_term_contributions.py- calculates the cognitive term contributions for each tract as a terms x tracts matrix containing the mean of normalized z-scores (across connected regions) per tract. Term z-scores are normalized usingscaled_robust_sigmoid(to handle extreme values in a subset of terms) and thresholded at >1.64.tract_cog_functions_plotting.py- visualizes the tract-term association results (barplots for terms and categories, as well as word clouds)
Tract functional diversity (Figure 5)
Tract gini coefficients of functional diversity and association with S-A range is done in:
➡️ Path: /code/5_functional_diversity
Code:
tract_gini_coefficients.py- generates Gini coefficients of diversity for each tract.tract_gini_plotting.py- plots the Lorenz curves and Gini coefficients as lollipops and in a glass brain.tract_diversity_sa_axis.py- significance testing of the association between tract Gini coefficients and S-A ranges.
Sensitivity analyses
tract_diversity_sa_axis_tract_types.py- analyzes association tracts and projection tracts separately.tract_gini_sa_correlation_rewiring_null.py- significance testing using tract rewiring (takes some time).
Biological cortical similarity (Figure 6)
The association between tract mean cortical similarity (based on neurobiological cortical features from neuromaps and BigBrain), S-A range and Gini coefficient of diversity is done in:
➡️ Path: analysis/6_cortical_similarity/
Code:
similarity_diagram.py- generates plots for the explanatory diagram.cortical_similarity_sa_axis_func_diversity.py- significance testing of the correlations between tract mean cortical similarity, S-A range, and Gini coefficients of diversity.
Sensitivity analyses
cortical_similarity_tract_types.py- analyzes association tracts and projection tracts separately.gini_cortical_similarity_rewiring_null.py- significance testing for the correlation between tract mean cortical similarity and Gini coefficients using tract rewiring (takes some time).sa_range_cortical_similarity_rewiring_null.py- significance testing for the correlation between tract mean cortical similarity and Gini coefficients using tract rewiring (takes some time).
Individual-level analysis — age effects (Figure 7)
This section runs individual-level age GAMs, then tests associations between tract-wise age partial R² and tract functional properties (Gini coefficient, S–A range). This is done:
- for two datasets: the Philadelphia Neurodevelopmental Cohort (PNC) and the Healthy Brain Network (HBN)
- for two DWI metrics: Fractional Anisotropy (FA) and Mean Diffusivity (MD)
- A third microstructural measure, Intracellular Volume Fraction (ICVF) is additionally evaluated in HBN
Step 1: pull data on CUBIC
This first step happens on CUBIC:
➡️ CUBIC path: /cbic/projects/tractmaps/code
This code is also available in the repository, under code/get_data. First, pulling the data is done with scripts in the CUBIC project directory. Run these in order for each dataset (PNC, HBN):
get_subjects_list_<dataset>.sh- lists all the subjects with qsirecon data. This saves out a text file in the same directory:<dataset>_subject_list.txt, which will be needed in the next scripts.run_unzip_<dataset>_cubic.sh- contains the file pattern and subject list for file file extraction. This script callsunzip_files.shto actually extract the files. Thanks to Tien Tong for providing this script! Heads up: this takes a little while. Run it with:bash run_unzip_<dataset>_cubic.shcheck_subjects_<dataset>.sh— finds which subjects were in the original<dataset>_subject_list.txt(i.e., they have a qsirecon zip file) but don’t have a TSV output file. This should be 0 (all were unzipped correctly).
Step 2: create sample and run age GAMs
➡️ Local path: analysis/7_individual_level/<dataset>
create_group_level_<dataset>.R- saves a group csv for each microstructure measure indata/<dataset>/derivatives/cleaned. These will be used for final sample selection below. Note that it will take a while to load all subjects’ files.sample_creation_<dataset>.py- applies data exclusion. This outputs a final sample csv indata/<dataset>/derivatives/final_sample. This will be used in downstream analyses.- ONLY for HBN:
harmonize_hbn.R- applies harmonization; this is done for HBN as it’s a multisite study. run_gams_<dataset>.R- runs GAMs on each tract to determine the relationship between tract microstructure and age. This outputs partial R² and FDR-corrected p-values in:results/individual_level/<dataset>.func_GAM_tractmaps.R- is called byrun_gams_<dataset>.Rto fit the GAMs.
Step 3: age partial R² vs tract properties
➡️ analysis/7_individual_level
run_partial_r2_age_effects.py— driver for combining PNC and HBN age-effect tables versus tract properties (writes combined tables and figures underresults/individual_level/age_effects/).test_partial_r2_tract_properties.py— is called byrun_partial_r2_age_effects.pyto run correlations and permutation tests; also contains plotting utilities.
Individual-level analysis — cognition effects (Figure 8)
This section runs individual-level cognition GAMs, then tests associations between tract-wise cognition partial R² and tract functional properties (Gini coefficient, S–A range). This is done:
- for two datasets: the Philadelphia Neurodevelopmental Cohort (PNC) and the Human Connectome Project Young Adult sample (HCP-YA)
- for two DWI metrics: Fractional Anisotropy (FA) and Mean Diffusivity (MD)
Step 1: pull data on CUBIC
This first step happens on CUBIC:
➡️ CUBIC path: /cbic/projects/tractmaps/code
This code is also available in the repository, under code/get_data. First, pulling the data is done with scripts in the CUBIC project directory. Run these in order, selecting the scripts corresponding to each dataset (PNC, HCPYA). Steps for pulling data are the same as above (see age analysis).
Step 2: create sample and run cognition GAMs
➡️ Local path: analysis/7_individual_level/<dataset>
create_group_level_<dataset>.R- saves a group csv for each microstructure measure indata/<dataset>/derivatives/cleaned. These will be used for final sample selection below. Note that it will take a while to load all subjects’ files.sample_creation_<dataset>.py- applies data exclusion. This outputs a final sample csv indata/<dataset>/derivatives/final_sample. This will be used in downstream analyses.run_gams_<dataset>.R- runs GAMs on each tract to determine the relationship between tract microstructure and cognition. This outputs partial R² and FDR-corrected p-values in:results/individual_level/<dataset>.func_GAM_tractmaps.R- is called byrun_gams_<dataset>.Rto fit the GAMs.
Step 3: cognition partial R² vs tract properties
➡️ analysis/7_individual_level
run_partial_r2_cognition_effects.py— driver for combining PNC and HCP-YA cognition-effect tables versus tract properties (writes combined tables and figures underresults/individual_level/cognition_effects/).test_partial_r2_tract_properties.py— is called byrun_partial_r2_cognition_effects.pyto run correlations and permutation tests; also contains plotting utilities.
Et voilà! 😊
A note on utilities
The repository also contains some utility functions in the code/utils folder. These get called throughout the data preparation and analyses steps. Here is a brief description:
figure_formatting.py- takes care of formatting all figures generated in the manuscript, e.g. ensuring consistent font size, font type, etc.matrix_to_tracts.py- function that extracts pairwise values for regions connected to tracts from any inter-regional matrix, and computes tract means.tm_utils.py- contains functions for loading and saving data; plotting brain maps on brain surfaces; and plotting correlations along with permutation testing.tract_visualizer.py- code to plot tracts in glass brains using DSI studio (through the command line). This can plot tracts all together or separately in a glass brain, can take specific color inputs or color gradients based on an continuous tract metric (ex: tract S-A range), and can save tracts in different layouts (medial, lateral, grid of tracts, etc.). Note that it requires access to tract trk files for plotting (for instance, this project used publicly available trk files at https://github.com/data-others/atlas/releases/download/hcp1065/hcp1065_avg_tracts_trk.zip). It also requires theabbreviations.xlsxfor tract name mapping.tract_visualizer_quickstart.py- contains a few examples illustrating how to usetract_visualizer.pyto plot tracts.tract_visualizer_readme.md- a brief description of tract visualization scripts, their key features, and requirements.