EF April 2022 Data Freeze
Prior to Kristin and Sophia’s departure in summer 2022, all EF data collected before April 1st, 2022, was downloaded, organized, and saved to Flywheel. This page incluldes details about current enrollment, data organization, auditing, and the final resting place of each data type.
Additional information on data completeness per subject can be found here.
A cover sheet summarizing all data types completeness is found here. ALL ENROLLED sheet sums data across types for every participant enrolled in EF. ALL SCANNED sheet sums data across types for every participant from whom we have acquired scan data.
Current Enrollment
As of April 1st, 2022, 173 parrticipantst had signed the EF consent form and participated in intake procedures under the LiBI Protocol. 119 participants had completed T1 imaging procedures. 37 participants had completed T2 imaging procedures.
Data sources
Enrollment data was pulled from the EFR01 Data Entry #tracker project on Axis, maintained by the study coordinators. Relevant reports include ef_t1_scan_ID, ef_t2_scan_ID, redcap_ids, and collateral IDs.
Imaging data is sourced directly from Flywheel, using FlywheelTools.
Task data is pulled from Penn+Box and Flywheel.
Raw self report data is pulled from the Common Self-report Scales #collection Project on Axis. Scales included in the main battery are pulled used using the EF all self reports report. Scales included in the pre/post scan battery are pulled using the EF STAI S/T report.
CNB Data is pulled from both the main WebCNP page and the /WebCNP survey page. T1 LiBI CNB data can be acccessed using the SiteID “LIBI.” T2 LiBI CNB data, as well as all EXECTS battery data, can be accessed using the SiteID “EXECTS.”
Variability data is pulled from Penn+Box.
Clinical or Diagnostic data is pulled from Oracle, using the Diagnosis Retrieve function for the LiBI Common protocol.
Demographic Data is pulled both from Oracle, and the EFR01 Data Entry #tracker Project on Axis, using the Demos report. Age at scan can be calculated by using subject date of birth and date of scan pulled from Flywheel. Current ages, calculated in year and months, are saved on Saturn at the following file path: afp://saturn/Coordinators/Protocols/TED_PROTOCOLS/EXECUTIVE_829744/2022_data_freeze/inputs/t1_ages.csv
and afp://saturn/Coordinators/Protocols/TED_PROTOCOLS/EXECUTIVE_829744/2022_data_freeze/inputs/t2_ages.csv
Uploading Outstanding Data to Flywheel
Any data that hasn’t been uploaded after each individual acquisition can be uploaded in batches. This is especially helpful for the task data, which needs to be scored before it is uploaded.
-
To score raw task data and convert into a BIDS-valid
.tsv
and.json
, follow the steps outlined in convert_and_score_EF_task_data.ipynb -
To upload raw task data from PennBox, (ie:
.log
files) to the acquisition level , follow the steps outlined in log_file_atttachment_upload.ipynb. -
To upload scored data from PennBox, (ie:
.json
and.tsv
files) to the session level, follow tthe steps outlined in bids_attachment_upload.ipynb.
Self Report Data merges + Organization
The theory behind these data freezes is to add data not included in the previous upload to a cleaned, organized .csv
. Organiziation of self report scales was last completed in March of 2020, and therefore, scripts in this section will build off of files previously cleaned and currently stored on Saturn, at the following file path: afp://saturn/Coordinators/Protocols/TED_PROTOCOLS/EXECUTIVE_829744/flywheel_data_uploads/data_ready_for_upload
. When these steps need to be performed again, for subsequent data freezes, the same scripts can be used, while substituting the current cleaned data that lives on flywheel.
Records not inculded in previous freezes is identified from the raw data using the RedCap Scales ID, which is tracked manually in the Axis Tracker. The scripts below will use the data in the /redcap_ids or collateral_ids reports, to add any new records to a new .csv
.
Note: none of this data is scored, so as to avoid version contol or batch effects.
-
To merge and organize pre-scan self reports into one file, follow the steps outlined in merge_pre_post_scan_scales.ipynb.
-
To merge and organize the main battery of self reports into one file, follow the steps outlined in merge_main_scales.ipynb.
-
To merge and organize collateral self reports into one file, follow the steps outlined in merge_collateral_scales.ipynb.
-
To merge and organize demograhic information into one file, follow the steps outlined in merge_and_audit_basic_demos.ipynb. This script also generates and audit, following the same logic as below.
-
To organize and sort EF clinical diagnostic data, follow the steps outlined in merge_and_audit_dx.ipynb. This script also generates and audit, following the same logic as below.
Audit Scripts
These scripts follow the same basic logic – using the enrollment list pulled from Axis, a series of binary spreadsheets are created to determine which measures are complete for which participants, with the goal of filling in this google sheet.
-
To complete an audit of imaging data on Flywheel, first run the
Flaudit
gear on Flywheel. Using both thebids.csv
andseqinfo.csv
outputs, follow the steps in imaging_audit.ipynb to fill in the imaging section of the audit spread sheet. -
To complete an audit of the ancillary files on Flywheel (task data, variability, etc) follow the steps in flywheel_attachment_audit.ipynb. This script queries Flywheel directly by using the SDK and CLI.
-
To complete an audit of the main battery of self report scales, follow the steps in main_scales_audit.ipynb.
-
To complete an audit of the pre/post scan scales, follow the steps in pre_post_scan_scales_audit.ipynb.
-
To complete an audit of the collateral scales, follow the steps in collateral_scales_audit.ipynb.
-
To complete an audit of the PMU data on flywheel, follow the steps in PMU_audit.ipynb.
-
To complete an audit of variability data by each task, follow the steps in variability_task_audit.ipynb.