Data Narrative | RBC Documentation

Data Narrative for [INSERT DATASET NAME HERE]

Important Links (should all be on GitHub):

Data Processing Flow Diagram:
- Flow diagram that describes the lifecycle of this dataset
DSR GitHub Project Page(Curation/Validation and Processing Queue Status):
- Cards for tracking the curation and validation portion of the dataset. This page should be updated every time you perform an action on the data.
- Cards for tracking the progress of containerized pipeline runs on the data.

Plan for the Data

Why does PennLINC need this data?
For which project(s) is it intended? Please link to project pages below:
What is our goal data format?
- i.e. in what form do we want the data by the end of the “Curation” step? BIDS? Something else?

Data Acquisition

Who is responsible for acquiring this data?
Do you have a DUA? Who is allowed to access the data?
Where was the data acquired?
Describe the data. What type of information do we have? Things to specify include:
- number of subjects
- types of images
- demographic data
- clinical/cognitive data
- any canned QC data
- any preprocessed or derived data

Download and Storage

Who is responsible for downloading this data?
From where was the data downloaded?
Where is it currently being stored?
What form is the data in upon intial download (DICOMS, NIFTIS, something else?)
Are you using Datalad? If so, at which point did you check the data into datalad?
Is the data backed up in a second location? If so, please provide the path to the backup location:

Curation Process

Who is responsible for curating this data?
GitHub Link to curation scripts/heurstics:
GitHub Link to final CuBIDS csvs:
Describe the Curation Process. Include a list of the initial and final validation errors and warnings.
Describe additions, deletions, and metadata changes (if any).

Preprocessing Pipelines

For each pipeline (e.g. QSIPrep, fMRIPrep, XCP, C-PAC), please fill out the following information:
- Pipeline Name:
- Who is responsible for running preprocessing pipelines/audits on this data?
- Where are you running these pipelines? CUBIC? PMACS? Somewhere else?
- Did you implement exemplar testing? If so, please fill out the information below:
  - Path to exemplar dataset:
  - Path to exemplar outputs:
  - GitHub Link to exemplar audit:
- For production testing, please fill out the information below:
  - Path to production inputs:
  - GitHub Link to production outputs:
  - GitHub Link to production audit:

Post Processing

Who is using the data/for which projects are people in the lab using this data?
- Link to project page(s) here
For each post-processing analysis that has been run on this data, fill out the following
- Who performed the analysis?
- Where it was performed (CUBIC, PMACS, somewhere else)?
- GitHub Link(s) to result(s)
- Did you use pennlinckit?
  - https://github.com/PennLINC/PennLINC-Kit/tree/main/pennlinckit