Data Storage Guide
This document is a living guide intended to systematise and standardise the storage of data produced within AMT to ensure the data is traceable and reproducible into the future. This guide focuses on the general storage of all task data, with links to specific storage guides for data including of large image-based test datasets including digital image correlation (DIC) and infra-red thermography (IRT) data combined with points sensors (e.g. strain gauges, thermocouples, accelerometers, displacement transducers etc.).
Key Principles
As soon as possible after collection data should be immediately backed up to a cloud storage service (e.g. one drive or the PowerScale) even if the data does not yet conform to this guide. Unstructured data can be placed in the ‘ScratchSpace’ directory.
A data summary (DS) must be completed that captures all relevant meta-data for the experiment. The DS must contain all information required to reproduce the campaign and associated data from scratch.
All data should be stored in open data formats whenever possible.
Data volume permitting human readable formats are preferred, e.g., *.csv, *.xml
Where data must be stored in a proprietary format the software and version required to read the files must be detailed in the DS
The data should be stored in a consistent folder structure as detailed in its data storage guide.
All raw experimental data must be stored such that any post-processing is fully reproducible.
All calibration information should be stored with the raw data and detailed in the DS.
All static data traces for noise floor analysis must be stored.
If processed experimental data is stored all software that was used to process the data must be detailed in the DS.
If bespoke code was used to process the data this must also be noted in the DS and a link to the version of the code that was used should be provided.
Storage
Data stored on the Powerscale should be stored in a task folder, named with the associated task number. Each task folder should contain three folders.
A XXX folder for XXX.
A “data” folder which should contain any experimental data and outputs associated with the task. The contents of this folder should be in accordance with its associated data storage guide
A metadata folder which should contain the data summary file as specified below, alongside a README document specifying any additional work outputs that cannot be directory uploaded to PowerScale e.g. GitHub repositories.
Meta-Data & Data Summary File
This is the over-arching data that describes the data, how it was collected and its structure. All campaigns will need to capture the meta-data required to reproduce all data from scratch. This will be captured by filling in a template file called a ‘data-summary.json’ or a DS.An example DS for an experiment on HIVE can be found here: data-summary-example.json with an example template to download here: data-summary-template.json
The example DS is in human readable JSON format and can be edited using a simple text editor. It is recommended that the file is edited using a code highlighting editor to ease readability such as VSCode.
Data Specifics
Follow the guidelines applicable to the type of data being stored:
Storage
The task folder should be in the following format:
AMT-XXXX
XXXX
Metadata
Data-summary.json
README
Data
Data outputs of task, see (list of data storage guides)