Run GAM for element-wise data
ModelArray.gam.Rd
`ModelArray.gam` fits gam model for each of elements requested, and returns a tibble data.frame of requested model statistics.
Usage
ModelArray.gam(
formula,
data,
phenotypes,
scalar,
element.subset = NULL,
full.outputs = FALSE,
var.smoothTerms = c("statistic", "p.value"),
var.parametricTerms = c("estimate", "statistic", "p.value"),
var.model = c("dev.expl"),
changed.rsq.term.index = NULL,
correct.p.value.smoothTerms = c("fdr"),
correct.p.value.parametricTerms = c("fdr"),
num.subj.lthr.abs = 10,
num.subj.lthr.rel = 0.2,
verbose = TRUE,
pbar = TRUE,
n_cores = 1,
...
)
Arguments
- formula
Formula (passed to `mgcv::gam()`)
- data
ModelArray class
- phenotypes
A data.frame of the cohort with columns of independent variables and covariates to be added to the model. It should contains a column called "source_file", and this column should match to that in
data
.- scalar
A character. The name of the element-wise scalar to be analysed
- element.subset
A list of positive integers (min = 1, max = number of elements). The subset of elements you want to run. Default is `NULL`, i.e. requesting all elements in `data`.
- full.outputs
TRUE or FALSE, Whether to return full set of outputs. If FALSE, it will only return those requested in arguments
var.*
andcorrect.p.value.*
; if TRUE, argumentsvar.*
will be ignored, and will return all possible statistics forvar.*
and any options requested in argumentscorrect.p.value.*
.- var.smoothTerms
A list of characters. The list of variables to save for smooth terms (got from `broom::tidy(parametric = FALSE)`). Example smooth term: age in formula "outcome ~ s(age)". See "Details" section for more.
- var.parametricTerms
A list of characters. The list of variables to save for parametric terms (got from `broom::tidy(parametric = TRUE)`). Example parametric term: sex in formula "outcome ~ s(age) + sex". See "Details" section for more.
- var.model
A list of characters. The list of variables to save for the model (got from `broom::glance()` and `summary()`). See "Details" section for more.
- changed.rsq.term.index
A list of (one or several) positive integers. Each element in the list means the i-th term of the formula's right hand side as the term of interest for changed R-squared between with and without it. Both delta adjusted R-squared and partial R-squared will be calculated for each of term requested. Usually term of interest is smooth term, or interaction term in models with interactions. See "Details" section for more, especially the "WARNING" in Details section for cases with caution!!
- correct.p.value.smoothTerms
A list of characters. To perform and add a column for p.value correction for each smooth term. Default: "fdr". See "Details" section for more.
- correct.p.value.parametricTerms
A list of characters. To perform and add a column for p.value correction for each parametric term. Default: "fdr". See "Details" section for more.
- num.subj.lthr.abs
An integer, lower threshold of absolute number of subjects. For an element, if number of subjects who have finite values (defined by `is.finite()`, i.e. not NaN or NA or Inf) in h5 file >
num.subj.lthr.abs
, then this element will be run normally; otherwise, this element will be skipped and statistical outputs will be set as NaN. Default is 10.- num.subj.lthr.rel
A value between 0-1, lower threshold of relative number of subjects. Similar to
num.subj.lthr.abs
, if proportion of subjects who have valid value >num.subj.lthr.rel
, then this element will be run normally; otherwise, this element will be skipped and statistical outputs will be set as NaN. Default is 0.2.- verbose
TRUE or FALSE, to print verbose messages or not
- pbar
TRUE or FALSE, to print progress bar or not
- n_cores
Positive integer, The number of CPU cores to run with
- ...
Additional arguments for `mgcv::gam()`
Details
You may request returning specific statistical variables by setting var.*
, or you can get all by setting full.outputs=TRUE
.
Note that statistics covered by full.outputs
or var.*
are the ones from broom::tidy(), broom::glance(), and summary() only, and do not include delta adjusted R-squared or partial R-squared or corrected p-values. However FDR-corrected p-values ("fdr") are generated by default.
List of acceptable statistic names for each of var.*
:
var.smoothTerms
: c("edf","ref.df","statistic","p.value"); For interpretation please see tidy.gam with `parametric=FALSE`.var.parametricTerms
: c("estimate", "std.error","statistic","p.value"); For interpretation please see tidy.gam with `parametric=TRUE`.var.model
: c("adj.r.squared","dev.expl", "sp.criterion", "scale", "df", "logLik","AIC", "BIC", "deviance", "df.residual", "nobs"); "adj.r.squared" isr.sq
from summary.gam; "sp.criterion" issp.criterion
from summary.gam; For interpretation please see glance.gam and summary.gam.
Regarding formula: So far these kinds of formula are tested:
formula with smooth term, but without any interactions. Examples like
y ~ s(x) + orderedFactor
;y ~ s(x) + s(z)
formula with interaction, but limited to only one interaction term, and in the formats of:
Formula #1:
y ~ orderedFactor + s(x) + s(x, by=orderedFactor) + other_covariate
, whereorderedFactor
should be discrete variables and generated by `ordered`. The interaction term will be displayed as "s_x_BYorderedFactor" in the column name in returned data.frame. You may use function `generator_gamFormula_factorXsmooth()` to generate one.Formula #2:
y ~ ti(x) + ti(z) + ti(x,z) + other_covariate
, wherex
andz
should be continuous variables. The interaction term will be displayed as "ti_x_z" in the column name in the returned data.frame. You may use function `generator_gamFormula_continuousInteraction()` to generate one.
You may be interested in how important a term is in a model. We provide two ways of quantification (see below). Both of them require running the reduced model without this term of interest, thus it will take longer time to run. You can make such request via argument changed.rsq.term.index
, and you'll get both quantifications.
Delta adjusted R-squared (
delta.adj.rsq
) is defined as the difference between adjusted R-squared of full model (full formula informula
) and that of reduced model (formula without the term of interest). Notice that adjusted R-squared includes the penalty from the model complexity.Partial R-squared (
partial.rsq
) is defined as:(sse.reduced.model - sse.full.model) / sse.reduced.model
, wheresse
is the error sum of squares (or, residual sum of squares). It quantifies the amount of variance in the response variable that cannot be explained by the reduced model (model without term of interest), but can be explained by the term of interest in the full model.
___!!! WARNING !!!___: If you want to request changed.rsq.term.index
for a term that has missing values, before feeding phenotypes
into ModelArray.gam()
, you must exclude those observations (i.e., those rows in phenotypes
) who have missing values in this term of interest from phenotypes
.
You should do the same for each term you'd like to request in changed.rsq.term.index
, if that term has missing values.
Without such exclusion, the full and reduced models would include different number of subjects, causing inaccuracy of calculation of delta.adj.rsq
and partial.rsq
.
Other notes on changed.rsq.term.index
:
When requesting
changed.rsq.term.index
,fx
should be set asTRUE
, so that degree of freedom is fixed.For formula with interactions, only formula in above formats are tested, and only the values of interaction term are valid. The delta.adj.rsq and partial.rsq for main effect (such as s(x) in Formula #1) may not "functionally" be these metrics, as their definitions should be changed to reduced formula without both main effect and interaction term.
For p-value corrections (arguments correct.p.value.*
), supported methods include all methods in `p.adjust.methods` except "none". You can request more than one method. FDR-corrected p-values ("fdr") are calculated by default. Turn it off by setting to "none".
Please notice that different from `ModelArray.lm`, there is no p.value for the GAM model, so no "correct.p.value.model" for GAM model.
Arguments num.subj.lthr.abs
and num.subj.lthr.rel
are mainly for input data with subject-specific masks, i.e. currently only for volume data. For fixel-wise data, you may ignore these arguments.