Skip to contents

This vignette covers the three modelling functions in ModelArray in detail. If you haven’t already, start with vignette("walkthrough") for the end-to-end workflow, and vignette("elements") for background on what an element is.

library(ModelArray)

# Using the PNC fixel demo data (see walkthrough for download instructions)
h5_path <- "~/Desktop/myProject/demo_FDC_n100.h5"
modelarray <- ModelArray(h5_path, scalar_types = c("FDC"))
phenotypes <- read.csv("~/Desktop/myProject/cohort_FDC_n100.csv")

Linear models with ModelArray.lm()

Basic usage

ModelArray.lm() fits lm() independently at every element. Define a formula with the scalar name as the response variable:

formula.lm <- FDC ~ Age + sex + dti64MeanRelRMS

result <- ModelArray.lm(formula.lm, modelarray, phenotypes, "FDC",
  element.subset = 1:100
)

Output columns

By default, for each term (Intercept, Age, sex, etc.) the output includes:

Column Description
<term>.estimate Coefficient estimate (slope)
<term>.statistic t-statistic
<term>.p.value Uncorrected p-value
<term>.p.value.fdr FDR-corrected p-value

And for the overall model:

Column Description
model.adj.r.squared Adjusted R-squared
model.p.value Model F-test p-value
model.p.value.fdr FDR-corrected model p-value

More comprehensive output

Use full.outputs = TRUE to get additional statistics:

result_full <- ModelArray.lm(formula.lm, modelarray, phenotypes, "FDC",
  element.subset = 1:100,
  full.outputs = TRUE
)

P-value correction

By default, FDR correction is applied. You can request additional correction methods:

result <- ModelArray.lm(formula.lm, modelarray, phenotypes, "FDC",
  element.subset = 1:100,
  correct.p.value.terms = c("fdr", "bonferroni")
)

This adds <term>.p.value.bonferroni columns alongside the FDR-corrected ones.

Generalized Additive Models with ModelArray.gam()

GAMs are useful when you expect nonlinear relationships - for example, nonlinear changes in brain structure across the lifespan.

Basic usage

Use s() to specify smooth terms. The k parameter sets an upper limit on the degrees of freedom for the smooth function:

formula.gam <- FDC ~ s(Age, k = 4) + sex + dti64MeanRelRMS

result <- ModelArray.gam(formula.gam, modelarray, phenotypes, "FDC",
  element.subset = 1:100,
  method = "REML"
)

ModelArray prints a summary of the smooth term settings when you run this:

The formula requested: FDC ~ s(Age, k = 4) + sex + dti64MeanRelRMS
s(Age):   k = 4;   fx = FALSE (default);   bs = tp (default)
method = REML (default: GCV.Cp)

Output columns

GAM output differs from lm for smooth terms vs parametric terms:

Smooth terms (e.g., s(Age)):

Column Description
s_Age.statistic F-statistic for the smooth
s_Age.p.value Uncorrected p-value
s_Age.p.value.fdr FDR-corrected p-value

Parametric terms (e.g., sex, dti64MeanRelRMS):

Column Description
<term>.estimate Coefficient estimate
<term>.statistic t-statistic
<term>.p.value Uncorrected p-value

Model-level:

Column Description
model.dev.expl Proportion of null deviance explained

P-value correction for GAMs

Smooth terms and parametric terms have separate correction arguments:

result <- ModelArray.gam(formula.gam, modelarray, phenotypes, "FDC",
  element.subset = 1:100,
  correct.p.value.smoothTerms = c("fdr", "bonferroni"),
  correct.p.value.parametricTerms = c("fdr", "bonferroni"),
  method = "REML"
)

Smooth term options

ModelArray supports the full range of mgcv::gam() smooth terms:

  • s(): penalized regression splines (default thin-plate, bs = "tp")
  • te(), ti(), t2(): tensor product smooths for interactions between continuous variables

The fx parameter controls whether the smooth is penalized (fx = FALSE, the default) or unpenalized/fixed (fx = TRUE). The bs parameter selects the basis type.

GAM formula helpers

ModelArray provides two helper functions for common GAM formula patterns:

Factor-smooth interaction (gen_gamFormula_fxSmooth): Generates y ~ orderedFactor + s(x) + s(x, by=orderedFactor):

result <- gen_gamFormula_fxSmooth(
  response.var = "FDC",
  factor.var = "group",
  smooth.var = "Age",
  phenotypes = phenotypes,
  reference.group = "control",
  fx = TRUE, k = 4
)
formula <- result$formula
phenotypes <- result$phenotypes # may contain new ordered factor column

Continuous interaction (gen_gamFormula_contIx): Generates y ~ ti(x) + ti(z) + ti(x, z):

formula <- gen_gamFormula_contIx(
  response.var = "FDC",
  cont1.var = "Age",
  cont2.var = "BMI",
  fx = TRUE, k = 4
)

Custom functions with ModelArray.wrap()

ModelArray.wrap() lets you run any R function across all elements, using ModelArray’s efficient looping and parallelization machinery.

Prototyping with exampleElementData()

Before writing your function, use exampleElementData() to see what a per-element data frame looks like:

# Using the HBN cortical thickness data
h5_path <- "path/to/thickness.h5"
modelarray <- ModelArray(h5_path, scalar_types = c("thickness"))
phenotypes <- read.csv("path/to/phenotypes.csv")

example_df <- exampleElementData(modelarray,
  scalar = "thickness", i_element = 12345, phenotypes = phenotypes
)
names(example_df)

This returns a data frame with all the phenotype columns plus a new column for the scalar value at that element. You can explore it like any data frame:

library(ggplot2)
ggplot(example_df, aes(x = age, y = thickness, color = study_site)) +
  geom_point() +
  stat_smooth(method = "lm") +
  theme_minimal()

Writing your function

Your function must accept a single argument named data and return a single-row data frame (or tibble):

my_site_analysis <- function(data) {
  # Compute mean thickness per site
  site_means <- data %>%
    dplyr::group_by(study_site) %>%
    dplyr::summarize(mean_thickness = mean(thickness, na.rm = TRUE)) %>%
    tidyr::pivot_wider(
      names_from = study_site,
      values_from = mean_thickness,
      names_glue = "thickness_{study_site}"
    )

  # Run ANOVA on site
  site_anova <- lm(thickness ~ study_site, data = data) %>%
    anova() %>%
    broom::tidy() %>%
    dplyr::filter(term != "Residuals") %>%
    dplyr::select(term, statistic, p.value) %>%
    tidyr::pivot_wider(
      names_from = term,
      values_from = c(statistic, p.value),
      names_glue = "{term}.{.value}"
    )

  dplyr::bind_cols(site_means, site_anova)
}

Testing on a single element

Use analyseOneElement.wrap() to verify your function works in ModelArray’s machinery:

analyseOneElement.wrap(
  12345, my_site_analysis, modelarray, phenotypes, "thickness",
  num.subj.lthr = 10, flag_initiate = TRUE, on_error = "debug"
)

Running across all elements

result <- ModelArray.wrap(
  FUN = my_site_analysis,
  data = modelarray,
  phenotypes = phenotypes,
  scalar = "thickness",
  n_cores = 8
)

head(result)
writeResults(h5_path, df.output = result, analysis_name = "site_analysis")

Converting results back to image format

After writeResults(), use the appropriate ModelArrayIO tool to convert results back to the original image format for visualization:

Data type Command Viewer
Fixel fixelstats_write MRtrix MRView
Voxel voxelstats_write FSLeyes, MRView
Surface ciftistats_write Connectome Workbench

See the ModelArrayIO documentation for command-line usage details, and vignette("walkthrough") for a worked example with MRView.