Elements: The Unit of Analysis • ModelArray

What is an element?

ModelArray performs mass-univariate statistical analysis - fitting the same model independently at every spatial location within the brain mask. We call each spatial location an element. Depending on the imaging modality, an element is one of:

Modality	Element type	Typical count	File format
Fixel-based analysis	Fixel (fiber population element)	~600,000	`.mif`
Voxel-based analysis	Voxel	~200,000–400,000	NIfTI (`.nii.gz`)
Surface-based analysis	Greyordinate (cortical vertex)	~330,000	CIFTI (`.dscalar.nii`)

Despite the differences in what they represent, ModelArray treats all of these uniformly. The same analysis functions - ModelArray.lm(), ModelArray.gam(), and ModelArray.wrap() -
work regardless of whether elements are fixels, voxels, or greyordinates.

The scalar matrix

At the core of every ModelArray object is a scalar matrix stored in an HDF5 file. This matrix has:

Rows = Elements (one per spatial location)
Columns = Sources (one per input file)

Each cell contains the scalar value (e.g., FDC, cortical thickness, FA) for that element in that input file.

library(ModelArray)

# Create a ModelArray object from an HDF5 file
modelarray <- ModelArray("path/to/data.h5", scalar_types = c("FDC"))

# View the scalar matrix
scalars(modelarray)[["FDC"]]

<602229 x 100> matrix of class DelayedMatrix and type "double":
          FDC/sub-6fee490.mif FDC/sub-647f86c.mif ... FDC/sub-063fd82.mif
     [1,]          0.24264026          0.15679701   .          0.16528498
     [2,]          0.04573315          0.30895054   .          0.33469012
      ...                   .                   .   .                   .
[602229,]          0.12752580          0.51846390   .          0.18338820

This is a DelayedMatrix, meaning the data lives on disk in the HDF5 file and is only read into memory when accessed. This is what makes ModelArray memory-efficient — see vignette("hdf5-large-analyses") for details.

Element IDs

Every element has an integer element ID starting at 0. The element ID corresponds directly to the row index in the scalar matrix (row 1 = element ID 0, row 2 = element ID 1, and so on).

When you run a model with ModelArray, the output data frame includes an element_id column to maps results the spatial location of each element:

result <- ModelArray.lm(FDC ~ Age + sex, modelarray, phenotypes, "FDC",
  element.subset = 1:5
)
result$element_id

[1] 0 1 2 3 4

You can check the total number of elements with:

numElementsTotal(modelarray, "FDC")

Working with a subset of elements

The element.subset parameter, available in ModelArray.lm(), ModelArray.gam(), and ModelArray.wrap(), lets you run models on a subset of elements rather than all of them. This is useful for:

Testing: run on the first 100 elements to verify your model works before committing to a full run
Batch processing: split a large analysis across multiple HPC jobs (see vignette("element-splitting"))

element.subset takes a vector of 1-based indices into the scalar matrix. For example, element.subset = 1:100 runs on elements with IDs 0 through 99 (the first 100 rows of the matrix).

# Test on the first 100 elements
result_test <- ModelArray.lm(FDC ~ Age + sex, modelarray, phenotypes, "FDC",
  element.subset = 1:100
)

# Run on all elements (default)
result_full <- ModelArray.lm(FDC ~ Age + sex, modelarray, phenotypes, "FDC")

Element-specific considerations

For voxel-wise or vertexwise data, subjects may have different brain coverage (subject-specific masks). This means some elements, particularly near the edge of the analysis mask, may not have valid data from all subjects.

ModelArray handles this with two threshold parameters in ModelArray.lm() and ModelArray.gam():

num.subj.lthr.abs: minimum number of subjects required (absolute count)
num.subj.lthr.rel: minimum proportion of total subjects required (0–1)

A voxel is only analyzed if the number of subjects with valid (non-NaN) values exceeds both thresholds. Otherwise, the voxel is skipped and its outputs are set to NaN.

# Require at least 10 subjects AND at least 20% of total subjects
result <- ModelArray.lm(FA ~ Age + sex, modelarray, phenotypes, "FA",
  num.subj.lthr.abs = 10,
  num.subj.lthr.rel = 0.2
)

For fixel-wise and surface-based data, all subjects typically share the same mask, so these thresholds are not needed.