The entire analytic workflow implemented in this project is described in the following sections. This workflow includes data preparation and analyses included in the manuscript. Scripts should be run in the order outlined below.

Project information

Summary

While long-range white matter (WM) tracts support communication between distant regions, it remains unknown how WM tracts are positioned within the cortical hierarchy to support cognition. Here, we leveraged population-level tract-to-region mapping, task-based functional activations, and measures of cortical biology to delineate how tracts are situated within the cortical hierarchy to support cognition. We found that tracts are differentially positioned along the cortical hierarchy to support specific cognitive functions and serve as bridges connecting distinct biological environments. Their hierarchical placement also reflects developmental variation in tract microstructure and individual differences in cognition. Together, these findings provide a framework that moves beyond conventional tract categories and emphasizes the essential role that WM tracts play in the cortical hierarchy.

Team

Role	Name
Project lead	Joëlle Bagautdinova
Faculty lead	Theodore D. Satterthwaite
Analytic replicator	Golia Shafiei
Collaborators	Audrey C. Luo, Margaret K. Pecsok, Taylor Salo, Aaron F. Alexander-Bloch, Dani S. Bassett, Margaret E. Gardner, Raquel E. Gur, Ruben C. Gur, Allyson P. Mackey, Bratislav Misic, Tyler M. Moore, David R. Roalf, Russell T. Shinohara, Valerie J. Sydnor, Tien T. Tong, Fang-Cheng Yeh, Russell A. Poldrack, Matthew Cieslak

Timeline

Project start date	January 2023
Current project status	In prep

Code and communication

Github repository	https://github.com/PennLINC/tractmaps
Slack channel	#cortical-tractometry

Datasets

Population-level tract-to-region mappings from Yeh, 2022
Population-level meta-analytic task activations from Neurosynth
Population-level biological cortical properties from Neuromaps and Bigbrain
Individual-level data from the Philadelphia Neurodevelopmental Cohort (PNC)

Directory structure

Most data and analyses are done locally in the tractmaps directory. The directories on github contain everything but the data/raw folder. The directory is organized as follows:

Directory	Description
`~/code`	where manuscript code lives
`~/code/python_env`	has code to build the python environment
`~/code/data_prep`	raw data is used by code from `/code/data_prep` to generate derivatives used in the analyses
`~/code/get_data`	has code to get PNC data
`~/code/utils`	has utility functions used across analyses
`~/code/analysis`	has code to generate all manuscript result figures 1-7
`~/data/`	where manuscript data lives
`~/data/raw`	has input data required to conduct analyses
`~/data/derivatives`	has data derivatives that are used in analyses

Data for individual-level PNC analyses are stored on CUBIC. The project directory on CUBIC is /cbica/projects/tractmaps. The directory is organized as follows:

Directory	Description
`~/tractmaps/code/get_data`	code to fetch the PNC dMRI data and store in tractmaps cubic project
`~/data/PNC/`	has PNC dMRI scalar, qc, and demographics data for required in analyses

Code documentation

Overview of the analytic workflow

Step	Task
1	Install the required software and build the environment
2	Prepare data derivatives; these will be the inputs for analyses
3	Build the data structure figure (Figure 1)
4	Examine the relationship between tract spatial positioning and S-A range (Figure 2)
5	Assess the spatial relationship between tracts and cognitive terms using Partial Least Squares (Figure 3)
6	Perform functional decoding of WM tracts to get the contribution of each cognitive term in each tract (Figure 4)
7	Quantify the functional diversity of tracts using the Gini coefficient and evaluate the association with tract S-A range (Figure 5)
8	Quantify the biological cortical similarity of WM tracts and evaluate the relationship with S-A range and cognitive diversity (Figure 6)
9	Prepare PNC data for age and cognition analyses
10	Perform GAMs on PNC data assessing the effect of age and cognition on WM tracts microstructure (fractional anisotropy)

Project software

The following external software was used in this project:

Connectome workbench, version 2.0.1 (see details below)
DSI studio, version Hou “侯” May 27 2025 (see details below)
Python, version 3.8
R, version 4.4.2
QSIPrep, version 1.0.0rc1
QSIRecon, version 1.1.0

Connectome workbench

Connectome Workbench can be downloaded at https://www.humanconnectome.org/software/get-connectome-workbench. This software is needed for generating region labels and for the neuromaps Python package. This project used version 2.0.1.

Once that’s downloaded, add the path to the bash profile:

echo 'export PATH=$PATH:/Applications/workbench/bin_macosx64' >> ~/.bash_profile)
# note that for newer Macs, add the path is added differently:
echo 'export PATH=$PATH:/Applications/workbench/bin_macosxsub' >> ~/.zshrc

DSI studio

DSI studio can be downloaded at: https://dsi-studio.labsolver.org/download.html

This version was used for tract visualizations on a Mac book pro M4, MacOS 15.7:

DSI Studio version: Hou "侯" May 27 2025

Build the environment

The Python environment used for this project was build using the build_environment.sh script.

Note: make sure conda and homebrew are installed prior to doing this!

cd ~/code/python_env
bash build_environment.sh

Publicly available population-level data

This project uses the following publicly available resources:

The original HCP-MMP parcellation dlabel.nii from [https://balsa.wustl.edu/78X3
HCP-MMP region labels and coordinates (full description based on supplemental files of the Glasser et al., 2016 paper) made available at https://neuroimaging-core-docs.readthedocs.io/en/latest/pages/atlases.html#how-to-download-and-install-additional-fsl-atlases
Tract-to-region probability matrix from Yeh 2022, Nature Methods
Tract trk files (for tract visualization purposes) from https://github.com/frankyeh/data-atlas/releases/download/hcp1065/hcp1065_avg_tracts_trk.zip listed on this page https://brain.labsolver.org/hcp_trk_atlas.html.
The HCP-YA 1065 1-mm population-averaged FIB file in the ICBM152 space (also for tract visualization): https://pitt-my.sharepoint.com/:u:/g/personal/yehfc_pitt_edu/EVqdp-lm0_NCgsFkSNDZeP0Bb3PpiSJRGpOKFhvVgZihVQ?e=GhUuOc provided on this page: https://brain.labsolver.org/hcp_template.html
Tract names and abbreviations : data/derivatives/tract_names/abbreviations.xlsx taken and adjusted from https://github.com/data-others/atlas/releases/download/hcp1065/abbreviation2.xlsx
S-A axis ranks, glasser parcellated from https://github.com/PennLINC/S-A_ArchetypalAxis/tree/main/Glasser360_MMP

This data can be downloaded and placed into relevant subfolders in data/raw. Cognitive terms and biological properties are pulled directly using python packages.

Data preparation

First, raw data is used to prepare data derivatives that will be used in the analyses.

➡️ Code path: code/data_prep

Generate Glasser labels (label.gii and dlabel.nii): Glasser gifti labels are used for parcellation and visualization. The region labels from the original file were ordered as right (1-180), then left (181-360). They are reindexed in prep_glasser_labels.sh to be consistent with other data used in analyses, where left hemisphere regions are indices 1-180, right hemisphere regions are indices 181-360. This script calls remap_labels.sh. Run this as:
```
  cd /Users/joelleba/tractmaps/code/data_prep
  bash prep_glasser_labels.sh
```
Create the tract-to-region probabilities dataframe: the tract-to-region probabilities and region names are reformatted for analysis in prep_tract_probabilities.py
Regional coordinates: the x-y-z coordinates of Glasser regions are generated in prep_glasser_coords.py
Cognitive terms dataframe: Cognitive maps from neurosynth are obtained using the NiMARE package and parcellated in Glasser regions using parcellate_neurosynth.py. Note: for this script to run, first create a yaml file containing your email address:
```
  # Configuration file for neurosynth data preparation
    
  # Email address for downloading abstracts from neurosynth
  email_address: 'email.address@here.com'
    
```
Biological cortical properties dataframe: The cortical maps used in the project are obtained through neuromaps and BigBrain, then parcellated in parcellate_neuromaps.py
Euclidean distances: Regional pairwise Euclidean distances, as well as tract mean Euclidean distance, are generated in tract_euc_distance.py
Geodesic distance (sensitivity analysis): Regional Geodesic distances, as well as tract mean geodesic distance, are generated in tract_geodesic_distance.py. Note that this script takes a little while to run.
Sensorimotor-to-association axis: Tract S-A axis ranges are generated in prep_sa_axis.py
Tract biological cortical similarity: The cortical similarity of tracts based on neurobiological (neuromaps) properties is generated in tract_cortical_similarity.py
Regional nulls (spin-based): Spin-based nulls (for Partial Least Squares analysis) are generated incompute_nulls.py. Note: this will take a little while to run. This will output 10,000 null indices for Glasser regions.
Network rewiring nulls (sensitivity analysis): Tract rewiring nulls are generated intract_rewiring_nulls.py. Note: this will take a little while to run. This will output 10,000 dataframes containing tract probabilities for rewired null tracts, saved in a pkl.

These scripts generate all the data contained in the data/derivatives folder. In addition, this folder contains an excel file with tract names and abbreviations called abbreviations.xlsx, which was adapted from https://github.com/data-others/atlas/releases/download/hcp1065/abbreviation2.xlsx available on the DSI studio website at https://brain.labsolver.org/hcp_trk_atlas.html. The data in the data/derivatives folder can be used to run the analyses and generate the manuscript figures below.

Analysis

Data structure (Figure 1)

Plots for Figure 1 showing the input data structure for tract probabilities, cognitive terms, and cortical properties are generated in:

➡️ Path: code/analysis/1_data_structure/

data_structure_plotting.py - plots the tract-to-region, cognitive terms, and biological properties matrices along with a few example maps shown on the cortical surface.
tracts_vis_table_1.py - generates tract visualizations in glass brains for Table 1.

Spatial embedding (Figure 2)

The association between the mean Euclidean distance and S-A range of tracts is examined in:

➡️ Path: code/2_spatial_embedding/

plot_tract_distances.py - creates heatmaps of region coordinates and Euclidean distances with tract overlays.
plot_example_tract_sa_ranks.py - plots the full S-A axis and the S-A ranks of example tracts on the brain surface.
spatial_embedding.py - significance testing of the association between tract mean Euclidean distance and S-A range, and generates the correlation figure.

Sensitivity analyses

spatial_embedding_geodesic.py - uses geodesic distance instead of Euclidean distance.
spatial_embedding_tract_subsets.py - analyzes association tracts and projection tracts separately.
spatial_embedding_rewiring_null.py - significance testing using tract rewiring (this takes some time).

Terms-tracts partial least squares (Figure 3)

Partial least squares analysis to identify dominant patterns of covariance between cognitive terms and tracts is performed in:

➡️ Path: /code/3_pls/

pls_diagram.py - generates plots with simulated data for the explanatory diagram.
pls_terms_tracts.py - performs the PLS analysis. Note that the significance testing and cross-validation performed here takes a while to run.
pls_plotting.py - plots the PLS results. The bootstrapping done for the term and tract loadings here takes a little while.

Functional decoding (Figure 4)

The cognitive term contributions are generated for each tract in:

➡️ Path: /code/4_functional_decoding/

Code:

tract_term_contributions.py - calculates the cognitive term contributions for each tract as a terms x tracts matrix containing the mean of normalized z-scores (across connected regions) per tract. Term z-scores are normalized using scaled_robust_sigmoid (to handle extreme values in a subset of terms) and thresholded at >1.64.
tract_cog_functions_plotting.py - visualizes the tract-term association results (barplots for terms and categories, as well as word clouds)

Tract functional diversity (Figure 5)

Tract gini coefficients of functional diversity and association with S-A range is done in:

➡️ Path: /code/5_functional_diversity

Code:

tract_gini_coefficients.py - generates Gini coefficients of diversity for each tract.
tract_gini_plotting.py - plots the Lorenz curves and Gini coefficients as lollipops and in a glass brain.
tract_diversity_sa_axis.py - significance testing of the association between tract Gini coefficients and S-A ranges.

Sensitivity analyses

tract_diversity_sa_axis_tract_types.py - analyzes association tracts and projection tracts separately.
tract_gini_sa_correlation_rewiring_null.py - significance testing using tract rewiring (takes some time).

Biological cortical similarity (Figure 6)

The association between tract mean cortical similarity (based on neurobiological cortical features from neuromaps and BigBrain), S-A range and Gini coefficient of diversity is done in:

➡️ Path: analysis/6_cortical_similarity/

Code:

similarity_diagram.py - generates plots for the explanatory diagram.
cortical_similarity_sa_axis_func_diversity.py - significance testing of the correlations between tract mean cortical similarity, S-A range, and Gini coefficients of diversity.

Sensitivity analyses

cortical_similarity_tract_types.py - analyzes association tracts and projection tracts separately.
gini_cortical_similarity_rewiring_null.py - significance testing for the correlation between tract mean cortical similarity and Gini coefficients using tract rewiring (takes some time).
sa_range_cortical_similarity_rewiring_null.py - significance testing for the correlation between tract mean cortical similarity and Gini coefficients using tract rewiring (takes some time).

Individual-level analysis — age effects (Figure 7)

This section runs individual-level age GAMs, then tests associations between tract-wise age partial R² and tract functional properties (Gini coefficient, S–A range). This is done:

for two datasets: the Philadelphia Neurodevelopmental Cohort (PNC) and the Healthy Brain Network (HBN)
for two DWI metrics: Fractional Anisotropy (FA) and Mean Diffusivity (MD)
A third microstructural measure, Intracellular Volume Fraction (ICVF) is additionally evaluated in HBN

Step 1: pull data on CUBIC

This first step happens on CUBIC: ➡️ CUBIC path: /cbic/projects/tractmaps/code

This code is also available in the repository, under code/get_data. First, pulling the data is done with scripts in the CUBIC project directory. Run these in order for each dataset (PNC, HBN):

get_subjects_list_<dataset>.sh - lists all the subjects with qsirecon data. This saves out a text file in the same directory: <dataset>_subject_list.txt, which will be needed in the next scripts.
run_unzip_<dataset>_cubic.sh - contains the file pattern and subject list for file file extraction. This script calls unzip_files.sh to actually extract the files. Thanks to Tien Tong for providing this script! Heads up: this takes a little while. Run it with:
```
  bash run_unzip_<dataset>_cubic.sh
```
check_subjects_<dataset>.sh — finds which subjects were in the original <dataset>_subject_list.txt (i.e., they have a qsirecon zip file) but don’t have a TSV output file. This should be 0 (all were unzipped correctly).

Step 2: create sample and run age GAMs

➡️ Local path: analysis/7_individual_level/<dataset>

create_group_level_<dataset>.R - saves a group csv for each microstructure measure in data/<dataset>/derivatives/cleaned. These will be used for final sample selection below. Note that it will take a while to load all subjects’ files.
sample_creation_<dataset>.py - applies data exclusion. This outputs a final sample csv in data/<dataset>/derivatives/final_sample. This will be used in downstream analyses.
ONLY for HBN: harmonize_hbn.R - applies harmonization; this is done for HBN as it’s a multisite study.
run_gams_<dataset>.R - runs GAMs on each tract to determine the relationship between tract microstructure and age. This outputs partial R² and FDR-corrected p-values in: results/individual_level/<dataset>.
func_GAM_tractmaps.R - is called by run_gams_<dataset>.R to fit the GAMs.

Step 3: age partial R² vs tract properties

➡️ analysis/7_individual_level

run_partial_r2_age_effects.py — driver for combining PNC and HBN age-effect tables versus tract properties (writes combined tables and figures under results/individual_level/age_effects/).
test_partial_r2_tract_properties.py — is called by run_partial_r2_age_effects.py to run correlations and permutation tests; also contains plotting utilities.

Individual-level analysis — cognition effects (Figure 8)

This section runs individual-level cognition GAMs, then tests associations between tract-wise cognition partial R² and tract functional properties (Gini coefficient, S–A range). This is done:

for two datasets: the Philadelphia Neurodevelopmental Cohort (PNC) and the Human Connectome Project Young Adult sample (HCP-YA)
for two DWI metrics: Fractional Anisotropy (FA) and Mean Diffusivity (MD)

Step 1: pull data on CUBIC

This first step happens on CUBIC: ➡️ CUBIC path: /cbic/projects/tractmaps/code

This code is also available in the repository, under code/get_data. First, pulling the data is done with scripts in the CUBIC project directory. Run these in order, selecting the scripts corresponding to each dataset (PNC, HCPYA). Steps for pulling data are the same as above (see age analysis).

Step 2: create sample and run cognition GAMs

➡️ Local path: analysis/7_individual_level/<dataset>

create_group_level_<dataset>.R - saves a group csv for each microstructure measure in data/<dataset>/derivatives/cleaned. These will be used for final sample selection below. Note that it will take a while to load all subjects’ files.
sample_creation_<dataset>.py - applies data exclusion. This outputs a final sample csv in data/<dataset>/derivatives/final_sample. This will be used in downstream analyses.
run_gams_<dataset>.R - runs GAMs on each tract to determine the relationship between tract microstructure and cognition. This outputs partial R² and FDR-corrected p-values in: results/individual_level/<dataset>.
func_GAM_tractmaps.R - is called by run_gams_<dataset>.R to fit the GAMs.

Step 3: cognition partial R² vs tract properties

➡️ analysis/7_individual_level

run_partial_r2_cognition_effects.py — driver for combining PNC and HCP-YA cognition-effect tables versus tract properties (writes combined tables and figures under results/individual_level/cognition_effects/).
test_partial_r2_tract_properties.py — is called by run_partial_r2_cognition_effects.py to run correlations and permutation tests; also contains plotting utilities.

Et voilà! 😊

A note on utilities

The repository also contains some utility functions in the code/utils folder. These get called throughout the data preparation and analyses steps. Here is a brief description:

figure_formatting.py - takes care of formatting all figures generated in the manuscript, e.g. ensuring consistent font size, font type, etc.
matrix_to_tracts.py - function that extracts pairwise values for regions connected to tracts from any inter-regional matrix, and computes tract means.
tm_utils.py - contains functions for loading and saving data; plotting brain maps on brain surfaces; and plotting correlations along with permutation testing.
tract_visualizer.py - code to plot tracts in glass brains using DSI studio (through the command line). This can plot tracts all together or separately in a glass brain, can take specific color inputs or color gradients based on an continuous tract metric (ex: tract S-A range), and can save tracts in different layouts (medial, lateral, grid of tracts, etc.). Note that it requires access to tract trk files for plotting (for instance, this project used publicly available trk files at https://github.com/data-others/atlas/releases/download/hcp1065/hcp1065_avg_tracts_trk.zip). It also requires the abbreviations.xlsx for tract name mapping.
tract_visualizer_quickstart.py - contains a few examples illustrating how to use tract_visualizer.py to plot tracts.
tract_visualizer_readme.md - a brief description of tract visualization scripts, their key features, and requirements.

Reproducibility Guide

Table of contents

Project information

Summary

Team

Timeline

Code and communication

Datasets

Directory structure

Code documentation

Overview of the analytic workflow

Project software

Connectome workbench

DSI studio

Build the environment

Publicly available population-level data

Data preparation

Analysis

Data structure (Figure 1)

Spatial embedding (Figure 2)

Terms-tracts partial least squares (Figure 3)

Functional decoding (Figure 4)

Tract functional diversity (Figure 5)

Biological cortical similarity (Figure 6)

Individual-level analysis — age effects (Figure 7)

Individual-level analysis — cognition effects (Figure 8)

A note on utilities