Merge multiple ModelArrays from different HDF5 files
mergeModelArrays.RdCombines scalars from multiple ModelArray objects into a single ModelArray, aligning subjects via shared phenotype columns.
Arguments
- modelarrays
A list of at least two ModelArray objects, each constructed from a different HDF5 file.
- phenotypes_list
A list of data.frames, one per ModelArray in
modelarrays. Each must contain asource_filecolumn whose entries match the corresponding ModelArray's sources (i.e.sources(modelarrays[[i]])). Each must also contain all columns named inmerge_on.- merge_on
Character vector of column names present in all data.frames in
phenotypes_list, used to inner-join subjects across sessions/modalities (e.g.c("subject_id")). The combination of these columns must uniquely identify each subject within each data.frame.
Value
A list with two components:
- data
A combined ModelArray containing scalars from all inputs. Each scalar's columns are subsetted and reordered to match the inner-joined subject list.
- phenotypes
The inner-joined data.frame. Original
source_filecolumns are renamed tosource_file.<scalar_name>and a new unifiedsource_filecolumn is added for use with analysis functions.
Details
The merge performs an inner join of the phenotype data.frames on the
columns specified by merge_on. Only subjects present in all
phenotype data.frames are retained. Scalar matrices from each input
ModelArray are column-subsetted and reordered to match
the joined subject list.
A unified source_file column is created from the merge_on
columns so that downstream analysis functions
(ModelArray.lm, ModelArray.gam,
ModelArray.wrap) can align phenotypes to scalars. The
original source_file columns are renamed to
source_file.<first_scalar_name> for each input
ModelArray.
Scalar names must be unique across all input ModelArrays. If two
ModelArrays share a scalar name (e.g. both have "FD"), the
function will error. Element counts (number of rows) must match
across all scalars.
If element metadata is available (see elementMetadata),
the function checks that it is consistent across inputs and warns if
it differs or is only partially available.
See also
ModelArray for constructing individual
ModelArray objects, ModelArray.lm,
ModelArray.gam, ModelArray.wrap for
fitting models on the merged object,
elementMetadata for element correspondence checks.
Examples
if (FALSE) { # \dontrun{
# Load two sessions from different h5 files
ma1 <- ModelArray("session1.h5", scalar_types = c("FD"))
ma2 <- ModelArray("session2.h5", scalar_types = c("FC"))
phen1 <- read.csv("session1_cohort.csv")
phen2 <- read.csv("session2_cohort.csv")
# Merge on subject ID
merged <- mergeModelArrays(
modelarrays = list(ma1, ma2),
phenotypes_list = list(phen1, phen2),
merge_on = "subject_id"
)
# Use the merged object for cross-scalar analysis
merged$data
scalarNames(merged$data) # c("FD", "FC")
head(merged$phenotypes)
results <- ModelArray.lm(
FD ~ age + sex + FC,
data = merged$data,
phenotypes = merged$phenotypes,
scalar = c("FD", "FC")
)
} # }