Example walkthrough • ModelArray

Introduction

In this example walkthrough, we will use some example fixel-wise data to demonstrate the steps of using ModelArray and its companion python package, ConFixel. Analysis for voxel-wise data follows similar steps, but there are several differences when execution; for more details regarding application to voxel-wise data, please refer to vignette("voxel-wise_data") page. We also provide hints for voxel-wise data throughout current walkthrough.

By following the vignette("installations") page, to perform statistical analysis for fixel-wise data, you should have successfully installed ModelArray, ConFixel, and MRtrix. We expect that ConFixel has been installed in a conda environment called modelarray.

We will first prepare the data and convert it into the format that ModelArray requires (Step 1), then we’ll use ModelArray to perform the statistical analysis (Step 2). Finally we will convert the results into original file format and view them (Step 3).

Step 1. Prepare your data

We first create a folder called “myProject” on the Desktop. In a terminal console:

$ cd ~/Desktop
$ mkdir myProject
$ cd myProject
$ pwd       # print the full path of this folder

On a linux machine, you’ll see the printed full path of this folder is /home/<username>/Desktop/myProject

Here, we provide some demo fixel-wise data. This demo data includes 100 subjects aged 8-22 years from the Philadelphia Neurodevelopmental Cohort (PNC) Satterthwaite et al., 2014. This data was generated by following fixel-based analysis Raffelt et al., 2017 and are ready for fixel-wise statistical analysis. You can get the data by running the following:

$ wget -cO - https://osf.io/tce9d/download > download.tar.gz
$ tar -xzf download.tar.gz
$ rm download.tar.gz

Hint for voxel-wise data for Step 1: Data preparation and conversion steps are different for voxel-wise data. Please refer to here for how to do so.

Step 1.1. Overview of the data organization

The data is organized in the following way:

~/Desktop/myProject
├── cohort_FDC_n100.csv
├── FDC
│   ├── directions.mifs
│   ├── index.mif
│   ├── sub-010b693.mif
│   ├── sub-0133f31.mif
│   ├── sub-063fd82.mif
│   ├── ...
└── ...

In this demo data, the fixel-wise metric is FDC. As you can see here, in folder FDC, besides subject-level fixel-wise data in template space (e.g. sub-010b693.mif), there are also the files index.mif and directions.mif. These two files provide important information of the fixel locations - you can learn about their definitions here.

Step 1.2. Prepare a CSV file of cohort phenotypes

In addition to fixel-wise data, we also need a CSV file of cohort phenotypes. This file will be used by both ConFixel and ModelArray. Here’s an example: cohort_FDC_n100.csv:

subject_id	Age	sex	dti64MeanRelRMS	*scalar_name*	*source_file*
sub-6fee490	8.83	2	0.863669	FDC	FDC/sub-6fee490.mif
sub-647f86c	8.50	2	1.775610	FDC	FDC/sub-647f86c.mif
sub-4b3ffe0	8.42	1	1.911350	FDC	FDC/sub-4b3ffe0.mif
…	…	…	…	…	…

The table in this CSV file includes these columns:

Required: scalar_name, which tells us what metric is being analyzed, and source_file, which tells us which mif file will be used for this subject
Covariates: the covariates we want to include in the statistical model. Here dti64MeanRelRMS is a measure of in-scanner motion, which we want to control for.

Step 1.3. Convert data into an HDF5 file using ConFixel

One of the reasons why ModelArray is memory efficient is it takes advantages of Hierarchical Data Format 5 (HDF5) file format. The extension of this file format is .h5. In short, an HDF5 file stores large datasets hierarchically. Let’s use ConFixel to convert the list of mif files into an HDF5 file. In the terminal console:

$ conda activate modelarray
$ confixel \
    --index-file FDC/index.mif \
    --directions-file FDC/directions.mif \
    --cohort-file cohort_FDC_n100.csv \
    --relative-root /home/<username>/Desktop/myProject \
    --output-hdf5 demo_FDC_n100.h5

# remember to use your specific path in --relative-root;
# we recommend it's a full path too!

As mentioned before, here we take /home/<username>/Desktop/myProject as the main folder; you won’t need to repeat paths once you’ve used the --relative-root flag.

When running confixel, you will see a moving progress bar. When it finishes, it looks like this:

Extracting .mif data...
100%|█████████████████████████████████████████| 100/100 [00:03<00:00, 30.33it/s]

Now you got the converted HDF5 file demo_FDC_n100.h5 in folder ~/Desktop/myProject.

Step 2. Use ModelArray to perform statistical analysis

The next step is to use this HDF5 file and the CSV file we prepared to perform statistical analysis in R. If you installed RStudio (which we recommend), then you can simply launch RStudio. All the commands in this Step 2 section will be run in R.

Hint for voxel-wise data for Step 2: You can follow the same steps here. Note that each “element” is a voxel now, instead of a fixel. Make sure you replace the scalar name, covariates, file paths etc with yours. In addition, as voxel-wise data may have subject-specific masks, you can also tailor the lower threshold of number of subjects when applying ModelArray.lm() and ModelArray.gam(). See vignette("voxel-wise_data") page for more.

Step 2.1. Load ModelArray package in R

After you’ve installed ModelArray, here’s how we load the library:

library(ModelArray)

Step 2.2. Create a ModelArray-class object

To create a ModelArray-class object that represents the HDF5 file of fixel-wise data, we need the HDF5 filename and the scalar’s name:

# filename of example fixel-wise data (.h5 file):
h5_path <- "~/Desktop/myProject/demo_FDC_n100.h5"
# create a ModelArray-class object:
modelarray <- ModelArray(h5_path, scalar_types = c("FDC"))
# let's check what's in it:
modelarray

You’ll see:

ModelArray located at ~/Desktop/myProject/demo_FDC_n100.h5

  Source files:     100
  Scalars:          FDC
  Analyses:

This shows that there are 100 source FDC files in this modelarray you created.

You may take a look at what the scalar matrix looks like by using scalars():

# scalar FDC data:
scalars(modelarray)[["FDC"]]

You’ll see:

<602229 x 100> matrix of class DelayedMatrix and type "double":
          FDC/sub-6fee490.mif FDC/sub-647f86c.mif ... FDC/sub-fb15b55.mif FDC/sub-063fd82.mif
     [1,]          0.24264026          0.15679701   .          0.00454893          0.16528498
     [2,]          0.04573315          0.30895054   .          0.25159708          0.33469012
     [3,]          0.18638037          0.26985332   .          0.26864439          0.26453218
     [4,]          0.13169773          0.36824343   .          0.19793315          0.27003124
     [5,]          0.22650713          0.16020685   .          0.10235775          0.41848758
      ...                   .                   .   .                   .                   .
[602225,]           0.1058939           0.1262244   .           0.2559175           0.2618564
[602226,]           0.2273949           0.8616257   .           0.4713631           0.5220432
[602227,]           0.1899130           1.2083211   .           0.2561999           0.1202894
[602228,]           0.0000000           1.3483975   .           0.0000000           0.0000000
[602229,]           0.1275258           0.5184639   .           0.2065310           0.1833882

Rows are fixels (n = 602229), and column names are the source filenames (n = 100). Each value is a specific fixel’s FDC from a mif file.

Step 2.3. Load cohort phenotypes CSV file

We then load the accompanying CSV file:

# filename of example fixel-wise data (.h5 file):
csv_path <- "~/Desktop/myProject/cohort_FDC_n100.csv"
# load the CSV file:
phenotypes <- read.csv(csv_path)

As this CSV file already provides sufficient covariates ready for analysis below, we don’t need to do extra work on it.

Step 2.4. Perform statistical analysis

With both modelarray and data frame phenotypes set up, we can now perform the statistical analysis. At present, ModelArray supports linear models as well as generalized additive models (GAM) with and without penalized splines, which are particularly useful for studying nonlinear effects in lifespan data.

Let’s start with linear model ModelArray.lm(). We first need a formula defining the model. We mainly want to test how fixel FDC changes with age in adolescence, assuming it’s a linear relationship. We also add sex and in-scanner motion quantification dti64MeanRelRMS as covariates. Intercept will be automatically added. We first try it out on first 100 fixels before we running on all fixels:

# formula:
formula.lm <- FDC ~ Age + sex + dti64MeanRelRMS
# run linear model fitting with ModelArray.lm() on the first 100 fixels:
mylm.try <- ModelArray.lm(formula.lm, modelarray, phenotypes, "FDC",
  element.subset = 1:100
)

You’ll see:

subset: default
weights: default
na.action: default
method: default
model: default
x: default
y: default
qr: default
singular.ok: default
contrasts: default
offset: default
Fitting element-wise linear models for FDC
initiating....
looping across elements....
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=06s

It first printed how several options for lm were set, then worked on the model fitting. Let’s check the first 6 rows of the results, i.e. results of the first 6 fixels:

head(mylm.try)

You’ll see:

element_id	Intercept.estimate	Age.estimate	sex.estimate	dti64MeanRelRMS.estimate	Intercept.statistic	Age.statistic	sex.statistic	dti64MeanRelRMS.statistic	Intercept.p.value	Intercept.p.value.fdr	Age.p.value	Age.p.value.fdr	sex.p.value	sex.p.value.fdr	dti64MeanRelRMS.p.value	dti64MeanRelRMS.p.value.fdr	model.adj.r.squared	model.p.value	model.p.value.fdr
0	0.1574554	0.0005079	0.0108810	-0.0067989	2.049316	0.1281586	0.4091826	-0.1915206	0.0431593	0.0644169	0.8982916	0.9946718	0.6833172	0.8444836	0.8485222	0.9952377	-0.0287295	0.9715544	0.9715544
1	0.2678434	0.0026411	-0.0180576	-0.0001476	2.956456	0.5651542	-0.5758993	-0.0035272	0.0039163	0.0081590	0.5732873	0.9946718	0.5660309	0.8143135	0.9971930	0.9971930	-0.0242065	0.8822558	0.9588263
2	0.3808456	0.0000743	-0.0247444	-0.0313025	3.697746	0.0139822	-0.6941623	-0.6577973	0.0003622	0.0016322	0.9888732	0.9946718	0.4892568	0.7900378	0.5122426	0.9952377	-0.0196619	0.7793740	0.9519001
3	0.3462063	-0.0015211	-0.0165382	-0.0110064	3.870265	-0.3296572	-0.5341844	-0.2663025	0.0001982	0.0010433	0.7423772	0.9946718	0.5944489	0.8143135	0.7905774	0.9952377	-0.0260748	0.9220699	0.9588263
4	0.3831223	-0.0012686	-0.0498778	-0.0169274	3.608630	-0.2316505	-1.3574013	-0.3450802	0.0004912	0.0017543	0.8173026	0.9946718	0.1778363	0.6969094	0.7307890	0.9952377	-0.0085248	0.5418155	0.9519001
5	0.3618128	-0.0008015	-0.0274349	-0.0088992	3.207225	-0.1377359	-0.7026583	-0.1707341	0.0018214	0.0044425	0.8907376	0.9946718	0.4839691	0.7900378	0.8647922	0.9952377	-0.0250535	0.9006376	0.9588263

As you can see, for each fixel, by default, ModelArray.lm() returns:

Several statistics for the Intercept, Age, sex, and motion dti64MeanRelRMS. These statistics include:
- estimation of Intercept and the coefficients or slopes of other terms
- t-statistics
- p-value
- p-value after FDR correction
Several statistics for the linear model, including:
- adjusted R-squared
- p-value
- p-value after FDR correction

You may request more comprehensive statistics with the argument full.outputs=TRUE.

Now let’s try out ModelArray.gam(). The example commands are as below. Compared to a linear model, GAMs flexibly model linear and nonlinear effects by using smooth functions; penalty can also be applied to avoid over-fitting. Here we use s(Age) to indicate that this will be fit as a spline smooth function. The k=4 argument sets an upper limit on the degrees of freedom for this smooth function. Restricted maximum likelihood (REML) is used for fitting.

In addition to default FDR correction, we can also request Bonferroni correction.

We still first try it out on first 100 fixels:

# formula:
formula.gam <- FDC ~ s(Age, k = 4) + sex + dti64MeanRelRMS
# run GAM fitting with ModelArray.gam() on the first 100 fixels:
mygam.try <- ModelArray.gam(formula.gam, modelarray, phenotypes, "FDC",
  element.subset = 1:100,
  correct.p.value.smoothTerms = c("fdr", "bonferroni"),
  correct.p.value.parametricTerms = c("fdr", "bonferroni"),
  method = "REML"
)

You’ll see:

The formula requested: FDC ~ s(Age, k = 4) + sex + dti64MeanRelRMS
s(Age):   k = 4;   fx = FALSE (default);   bs = tp (default)
method = REML (default: GCV.Cp)
Fitting element-wise GAMs for FDC
initiating....
looping across elements....
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=14s

The GAM results of the first 6 fixels:

head(mygam.try)

You’ll see:

element_id	s_Age.statistic	s_Age.p.value	s_Age.p.value.bonferroni	s_Age.p.value.fdr	Intercept.estimate	sex.estimate	dti64MeanRelRMS.estimate	Intercept.statistic	sex.statistic	dti64MeanRelRMS.statistic	Intercept.p.value	Intercept.p.value.bonferroni	Intercept.p.value.fdr	sex.p.value	sex.p.value.bonferroni	sex.p.value.fdr	dti64MeanRelRMS.p.value	dti64MeanRelRMS.p.value.bonferroni	dti64MeanRelRMS.p.value.fdr	model.dev.expl
0	0.3919938	0.6396933	1	0.8545475	0.1662467	0.0085795	-0.0015008	3.732225	0.3229002	-0.0419848	0.0003225	0.0322463	0.0003839	0.7474771	1	0.8591691	0.9665985	1	0.9962095	0.0177441
1	0.3193919	0.5732950	1	0.8308623	0.3083310	-0.0180576	-0.0001477	5.843594	-0.5758990	-0.0035275	0.0000001	0.0000070	0.0000001	0.5660311	1	0.8033110	0.9971928	1	0.9971928	0.0068301
2	0.0342444	0.9559484	1	0.9956698	0.3821904	-0.0252837	-0.0300008	6.376270	-0.7091046	-0.6289196	0.0000000	0.0000006	0.0000000	0.4799807	1	0.7749732	0.5308975	1	0.9962095	0.0143341
3	0.1086654	0.7424475	1	0.9286274	0.3228876	-0.0165382	-0.0110065	6.197683	-0.5341825	-0.2663045	0.0000000	0.0000014	0.0000000	0.5944502	1	0.8033110	0.7905759	1	0.9962095	0.0050187
4	0.0536680	0.8173638	1	0.9504230	0.3636743	-0.0498777	-0.0169276	5.881524	-1.3573985	-0.3450840	0.0000001	0.0000059	0.0000001	0.1778372	1	0.7013344	0.7307862	1	0.9962095	0.0220368
5	0.0189684	0.8908133	1	0.9682754	0.3495258	-0.0274349	-0.0088991	5.319820	-0.7026591	-0.1707324	0.0000007	0.0000678	0.0000012	0.4839686	1	0.7749732	0.8647935	1	0.9962095	0.0060088

As you can see, for each fixel, here ModelArray.gam() returns:

Several statistics for the smooth term s(Age), including F-statistics and p-values (and those after FDR and Bonferroni correction);
Several statistics for the parametric terms Intercept, sex, and motion dti64MeanRelRMS, including estimation of the coefficient (or the slope), t-statistics, and p-values (and those after FDR and Bonferroni correction);
One statistic for the model: dev.expl, which is the proportion of the null deviance explained by the model.

For more options, please view pages for ModelArray.lm() and ModelArray.gam().

Step 2.5. A full run and saving the results

Previous examples only ran on a small subset of the fixels. Now we’ll formally run across all fixels and save the results. Because running all fixels will take a substantial amount of time, here we only run the linear model.

Notice that the command below uses the default value of element.subset=NULL so that all fixels will be analyzed. Also notice that, to speed up, we’re requesting 4 CPU cores to run in parallel. You may adjust this number based on how many CPU cores you have on your machine. On a Linux machine with Intel(R) Xeon(R) 10th Gen CPU @ 2.81GHz and using 4 CPU cores, it takes around 2.5 hours to finish.

# run linear model fitting with ModelArray.lm() on the all fixels:
mylm <- ModelArray.lm(formula.lm, modelarray, phenotypes, "FDC",
  n_cores = 4
)

We now save the results data frame into the original h5 file:

writeResults(h5_path, df.output = mylm, analysis_name = "results_lm")

Notice that the the analysis name specified in argument analysis_name will be used in ConFixel in the next step when converting results back to fixel mif file format. It’ll also be used as the prefix of the mif files to be saved.

We can even check out the saved results in the h5 file (this is optional):

# create a new ModelArray-class object:
modelarray_new <- ModelArray(
  filepath = h5_path, scalar_types = "FDC",
  analysis_names = c("results_lm")
)
modelarray_new

You’ll see:

ModelArray located at ~/Desktop/myProject/demo_FDC_n100.h5

  Source files:     100
  Scalars:          FDC
  Analyses:         results_lm

The results_lm is shown up here.

Step 3. Check out the result images

Hint for voxel-wise data for Step 3: Data conversion and result viewing are different for voxel-wise data. Please refer to here for how to do so.

Step 3.1. Convert the statistical results into mif file format using ConFixel

We now use ConFixel to convert the results into a list of mif files:

$ conda activate modelarray   # activate the conda environment we created
$ fixelstats_write \
  --index-file FDC/index.mif \
  --directions-file FDC/directions.mif \
  --cohort-file cohort_FDC_n100.csv \
  --relative-root /home/<username>/Desktop/myProject \
  --analysis-name results_lm \
  --input-hdf5 demo_FDC_n100.h5 \
  --output-dir results_lm

# remember to use your specific path in --relative-root;
# we recommend it's a full path too!

After it’s done, in the main folder myProject, there will be a new folder called results_lm, and converted result mif files are in this new folder:

results_lm/
├── directions.mif
├── index.mif
├── results_lm_Age.1m.p.value.fdr.mif
├── results_lm_Age.1m.p.value.mif
├── results_lm_Age.estimate.mif
├── results_lm_Age.p.value.fdr.mif
├── results_lm_Age.p.value.mif
├── results_lm_Age.statistic.mif
├── results_lm_dti64MeanRelRMS.1m.p.value.fdr.mif
├── results_lm_dti64MeanRelRMS.1m.p.value.mif
├── results_lm_dti64MeanRelRMS.estimate.mif
├── results_lm_dti64MeanRelRMS.p.value.fdr.mif
├── results_lm_dti64MeanRelRMS.p.value.mif
├── results_lm_dti64MeanRelRMS.statistic.mif
├── results_lm_element_id.mif
├── results_lm_Intercept.1m.p.value.fdr.mif
├── ...
├── results_lm_model.1m.p.value.fdr.mif
├── results_lm_model.1m.p.value.mif
├── results_lm_model.adj.r.squared.mif
├── results_lm_model.p.value.fdr.mif
├── results_lm_model.p.value.mif
├── results_lm_sex.1m.p.value.fdr.mif
├── ...

0 directories, 32 files

Here, 1m.p.value.* means it’s the 1 - p-value image.

Step 3.2. View the results in MRtrix’s MRView

You can launch MRView from the terminal with mrview:

# Suppose you're in myProject folder:
$ cd results_lm   # switch to results folder
$ mrview

Click File -> Open, select index.mif. Then click Tools -> Fixel plot, you’ll see a side panel of “Fixel plot”. Within this side panel, click button “Open fixel image” (see below, highlighted in red):

Open fixel image

From there, select index.mif file again. You’ll see colored directions.mif. Now you can choose which image to display. Click the filename next to “colour by”, select results_lm_Age.estimate.mif; then click the button next to “threshold by”, select results_lm_model.p.value.fdr.mif, check/tick the option of upper limit, then enter “0.005”. You may change the view by clicking “View” in the upper panel. Below is an example view:

An example view