Developer documentation
doc_for_developer.Rmd
Target audience: developers, and anyone from the broader community who hopes to contribute to ModelArray
.
In this documentation, we walk through important methods used in ModelArray
package, and the structure of the package and its functions.
We welcome community contributions to ModelArray
. If you hope to improve functions or fix a bug in our package, please follow the following steps:
 fork the GitHub repository
 modify the scripts
 update the tests and/or add necessary new tests (see folder
tests
)  run tests + update docs and website:
 to make sure the unit tests and package building are without error;
 to make sure the documentations and website can be built, and to update any documentations;
 to do so, run the following commands:
devtools::build_readme()
,devtools::check()
,devtools::build()
,devtools::build_vignettes()
, and finallypkgdown::build_site()
. Please check whether there is no error along the process, and the built website looks good.
 if everything looks good, please submit a pull request to our github repository.
Overview
ModelArray
R package supports data in Hierarchical Data Format 5 (HDF5) file format. It utilizes DelayedArray and HDF5Array R packages, so that statistical analysis can be performed without loading all the original data in HDF5 file into memory. This makes ModelArray
very efficient in memory usage. For details, please see section “ModelArray Construction” below.
Methods for showing and accessing ModelArray
object are basic. Please see section “Show and accessors methods of ModelArray class” for more.
The key function of ModelArray
package is to perform statistical analysis. For details, please see section “Model fitting” below.
Then we describe how to save statistical results to HDF5 file on disk.
Finally, we describe what unit tests are needed for quality assurance.
ModelArray
Construction
We first define a class called ModelArray
using setClass()
. An ModelArray
object (i.e. an instance of ModelArray
class) has several slots:
 sources: source filenames (e.g., a list of .mif files of fixel data)
 scalars: scalar matrix (or matrices)
 results: statistical result matrices (if any)
 path: the path to the h5 file on disk
The key feature of an ModelArray
object is memory efficient. This is because the entire dataset in HDF5 (.h5) file was not loaded into the memory; only minimal data was loaded. To achieve this, we first need ModelArraySeed()
, which utilizes HDF5Array::HDF5ArraySeed()
, acting as a pointer to the .hdf5 file on disk. To make the arrays in the ModelArray
object look more like “real” arrays, e.g. common array operations such as indexing and transposing can be applied, we utilize package DelayedArray
to wraps the data in ondisk HDF5 file into a DelayedArray
object. Finally a ModelArray
class is defined by integrating above slots together. This is done in the ModelArray()
function.
Above functions and setups can be found in script R/ModelArray_Constructor.R
Show and accessors methods of ModelArray
class
To easily show and access a ModelArray
object, we adopt S4 Object Oriented Programming (OOP) model.
End users of R actually are using such S4 methods frequently. For example, when showing the (summarized) content of an object, users simply enter its name in R console and press “enter”. Under the hood, it uses the generic show()
method. As for showing a ModelArray
object, we adopt the same way. As show()
has already been a generic function, we only need to setMethod()
for its setup.
With S4 OOP model, users can easily access the data they need. Below are current accessor functions for accessing different slots in ModelArray
class’s:

sources()
: source filenames (e.g., a list of .mif files of fixel data) 
scalars()
: scalar matrix (or matrices) 
results()
: statistical result matrix or matrices (if any)
We use setMethod()
to define above methods. Unlike show()
, these accessors are not generic yet, so we then do additional step of setGeneric()
.
All of these show and accessors setups can be found in script R/ModelArray_S4Methods.R
.
Model fitting
Functions such as ModelArray.lm()
and ModelArray.gam()
are for model fitting. Under the hood, these functions iteratively call analyseOneElement.lm()
and analyseOneElement.gam()
for fitting for one element, respectively. This also facilitates the parallel computing across all elements requested.
Besides the existing functions for linear models (ModelArray.lm()
) and GAMs (ModelArray.gam()
), new functions for diverse statistical models can be added by following the structure below.
The general structure for model fitting functions ModelArray.<model_name>()
in ModelArray
is as below:
 Some sanity checks of arguments
 Print out some important methods for model fitting, such as
method
inmgcv::gam()
for GAMs  Fit statistical models:
 First, initialize: run
analyseOneElement.<model_name>()
once to get the column names of the statistics in the output statistics; usually using an element in the middle (instead of the first or the last element) to avoid too many NaNs in the data;  Then, iteratively call
analyseOneElement.<model_name>()
across all elements requested (can be parallelized). The functionanalyseOneElement.<model_name>()
does the following jobs: Fit the model for this element;
 Get necessary statistics via
broom::tidy()
,broom::glance()
, and/orsummary()
.  Flatten the results into one row of data.frame for this element; add element ids; remove the column names (it’s a list of numeric now) to save a bit more memory
 Return this result of this element
 Concatenate results of all requested elements into a matrix of elements by statistics, then add the column names; now we get the final result data.frame
 First, initialize: run
 If pvalue correction (for terms and/or model) is requested:
 For each pvalue correction method, apply it and add the values to the result data.frame
 Return the result data.frame
For ModelArray.gam()
, there is an additional step after one round of iteration:
 If
changed.rsq.term.index
is notNULL
, i.e. there is request of quantifying one (or more terms)’s importance in the model (i.e. metrics delta adjusted Rsquared and partial Rsquared), we need to compute the reduced model (formula without the term of interest): For each requested term of interest:
 Iteratively call
analyseOneElement.<model_name>()
but providing reduced formula  Get necessary statistics
 Compute delta adjusted Rsquared and partial Rsquared; add them to the previously got data.frame
 Iteratively call
 For each requested term of interest:
Therefore, there might be two or more rounds of iterations on model fitting in ModelArray.gam()
, depending on number of terms the user requested in changed.rsq.term.index
.
Functions mentioned in this section can be found in scripts R/analyse.R
and R/ModelArray_Constructor.R
.
Saving statistical results to .hdf5 file on disk
After getting the data.frame of statistical results, users will save it into the original HDF5 file on disk. The function is writeResults()
located in script R/ModelArray_Constructor.R
Writing tests for CircleCI
We use Continuous Integration (CI) testing to ensure stability and quality assurance of ModelArray
. Therefore, when you modify the source code of ModelArray
, it is very important to make sure you add appropriate tests of the new or modified functionality.
All the test files are located in folder tests/testthat
. Once a commit is pushed to GitHub, CircleCI (a CI testing platform) will be triggered to build the software and run the unit tests.
The tests serve the goals of:
 Run functions in different scenarios (with different requests) and expect no error. This is to ensure when new features added, it will run without error; ** For some scenarios, some special data is needed. For example, when testing NAs in input data (for volume data)
 Expect the output data.frame will contain column with a specific name, or check the dimensions of the output data.frame. This is to ensure expected columns show up.
 Most importantly, check the values are as expected: statistical results calculated in ModelArray fitting loop = those calculated in standard R.
Developers can run the tests locally to see if unit tests in a test file (in tests/testthat
folder) can run without errors:
Then click “Run Test” button in RStudio to run the test file. Check if there is anything failed.
For more details on how to write unit tests + test out, please check out Testing
chapters in the book “R Packages” written by Hadley Wickham, Jennifer Bryan