Run GAM for element-wise data
`ModelArray.gam` fits gam model for each of elements requested, and returns a tibble data.frame of requested model statistics.
ModelArray.gam( formula, data, phenotypes, scalar, element.subset = NULL, full.outputs = FALSE, var.smoothTerms = c("statistic", "p.value"), var.parametricTerms = c("estimate", "statistic", "p.value"), var.model = c("dev.expl"), changed.rsq.term.index = NULL, correct.p.value.smoothTerms = c("fdr"), correct.p.value.parametricTerms = c("fdr"), num.subj.lthr.abs = 10, num.subj.lthr.rel = 0.2, verbose = TRUE, pbar = TRUE, n_cores = 1, ... )
Formula (passed to `mgcv::gam()`)
A data.frame of the cohort with columns of independent variables and covariates to be added to the model. It should contains a column called "source_file", and this column should match to that in
A character. The name of the element-wise scalar to be analysed
A list of positive integers (min = 1, max = number of elements). The subset of elements you want to run. Default is `NULL`, i.e. requesting all elements in `data`.
TRUE or FALSE, Whether to return full set of outputs. If FALSE, it will only return those requested in arguments
correct.p.value.*; if TRUE, arguments
var.*will be ignored, and will return all possible statistics for
var.*and any options requested in arguments
A list of characters. The list of variables to save for smooth terms (got from `broom::tidy(parametric = FALSE)`). Example smooth term: age in formula "outcome ~ s(age)". See "Details" section for more.
A list of characters. The list of variables to save for parametric terms (got from `broom::tidy(parametric = TRUE)`). Example parametric term: sex in formula "outcome ~ s(age) + sex". See "Details" section for more.
A list of characters. The list of variables to save for the model (got from `broom::glance()` and `summary()`). See "Details" section for more.
A list of (one or several) positive integers. Each element in the list means the i-th term of the formula's right hand side as the term of interest for changed R-squared between with and without it. Both delta adjusted R-squared and partial R-squared will be calculated for each of term requested. Usually term of interest is smooth term, or interaction term in models with interactions. See "Details" section for more, especially the "WARNING" in Details section for cases with caution!!
A list of characters. To perform and add a column for p.value correction for each smooth term. Default: "fdr". See "Details" section for more.
A list of characters. To perform and add a column for p.value correction for each parametric term. Default: "fdr". See "Details" section for more.
An integer, lower threshold of absolute number of subjects. For an element, if number of subjects who have finite values (defined by `is.finite()`, i.e. not NaN or NA or Inf) in h5 file >
num.subj.lthr.abs, then this element will be run normally; otherwise, this element will be skipped and statistical outputs will be set as NaN. Default is 10.
A value between 0-1, lower threshold of relative number of subjects. Similar to
num.subj.lthr.abs, if proportion of subjects who have valid value >
num.subj.lthr.rel, then this element will be run normally; otherwise, this element will be skipped and statistical outputs will be set as NaN. Default is 0.2.
TRUE or FALSE, to print verbose messages or not
TRUE or FALSE, to print progress bar or not
Positive integer, The number of CPU cores to run with
Additional arguments for `mgcv::gam()`
You may request returning specific statistical variables by setting
var.*, or you can get all by setting
Note that statistics covered by
var.* are the ones from broom::tidy(), broom::glance(), and summary() only, and do not include delta adjusted R-squared or partial R-squared or corrected p-values. However FDR-corrected p-values ("fdr") are generated by default.
List of acceptable statistic names for each of
var.smoothTerms: c("edf","ref.df","statistic","p.value"); For interpretation please see tidy.gam with `parametric=FALSE`.
var.parametricTerms: c("estimate", "std.error","statistic","p.value"); For interpretation please see tidy.gam with `parametric=TRUE`.
var.model: c("adj.r.squared","dev.expl", "sp.criterion", "scale", "df", "logLik","AIC", "BIC", "deviance", "df.residual", "nobs"); "adj.r.squared" is
r.sqfrom summary.gam; "sp.criterion" is
sp.criterionfrom summary.gam; For interpretation please see glance.gam and summary.gam.
Regarding formula: So far these kinds of formula are tested:
formula with smooth term, but without any interactions. Examples like
y ~ s(x) + orderedFactor;
y ~ s(x) + s(z)
formula with interaction, but limited to only one interaction term, and in the formats of:
y ~ orderedFactor + s(x) + s(x, by=orderedFactor) + other_covariate, where
orderedFactorshould be discrete variables and generated by `ordered`. The interaction term will be displayed as "s_x_BYorderedFactor" in the column name in returned data.frame. You may use function `generator_gamFormula_factorXsmooth()` to generate one.
y ~ ti(x) + ti(z) + ti(x,z) + other_covariate, where
zshould be continuous variables. The interaction term will be displayed as "ti_x_z" in the column name in the returned data.frame. You may use function `generator_gamFormula_continuousInteraction()` to generate one.
You may be interested in how important a term is in a model. We provide two ways of quantification (see below). Both of them require running the reduced model without this term of interest, thus it will take longer time to run. You can make such request via argument
changed.rsq.term.index, and you'll get both quantifications.
Delta adjusted R-squared (
delta.adj.rsq) is defined as the difference between adjusted R-squared of full model (full formula in
formula) and that of reduced model (formula without the term of interest). Notice that adjusted R-squared includes the penalty from the model complexity.
Partial R-squared (
partial.rsq) is defined as:
(sse.reduced.model - sse.full.model) / sse.reduced.model, where
sseis the error sum of squares (or, residual sum of squares). It quantifies the amount of variance in the response variable that cannot be explained by the reduced model (model without term of interest), but can be explained by the term of interest in the full model.
___!!! WARNING !!!___: If you want to request
changed.rsq.term.index for a term that has missing values, before feeding
ModelArray.gam(), you must exclude those observations (i.e., those rows in
phenotypes) who have missing values in this term of interest from
You should do the same for each term you'd like to request in
changed.rsq.term.index, if that term has missing values.
Without such exclusion, the full and reduced models would include different number of subjects, causing inaccuracy of calculation of
Other notes on
fxshould be set as
TRUE, so that degree of freedom is fixed.
For formula with interactions, only formula in above formats are tested, and only the values of interaction term are valid. The delta.adj.rsq and partial.rsq for main effect (such as s(x) in Formula #1) may not "functionally" be these metrics, as their definitions should be changed to reduced formula without both main effect and interaction term.
For p-value corrections (arguments
correct.p.value.*), supported methods include all methods in `p.adjust.methods` except "none". You can request more than one method. FDR-corrected p-values ("fdr") are calculated by default. Turn it off by setting to "none".
Please notice that different from `ModelArray.lm`, there is no p.value for the GAM model, so no "correct.p.value.model" for GAM model.
num.subj.lthr.rel are mainly for input data with subject-specific masks, i.e. currently only for volume data. For fixel-wise data, you may ignore these arguments.