Data Storage Guide

The live source document for this guide can be found here

Overview

Raw Data Structure

Processed Data Structure

Authors: Megan Sampson, Lloyd Fletcher

Date Created: 29th June 2023

Date Updated: 11th April 2025

Overview

This document is a living guide intended to systematise and standardise the storage of materials tests and component validation test data to ensure the data is traceable and reproducible into the future. This guide focuses on the storage of large image-based test datasets including digital image correlation (DIC) and infra-red thermography (IRT) data combined with points sensors (e.g. strain gauges, thermocouples, accelerometers, displacement transducers etc.).

Terminology

Throughout this document a series of terms will be used to describe the data that needs to be stored. Campaign refers to a series of connected experiments which might have a series of tests for each experiment.

Example 1: A campaign on HIVE occurs over several days and might include two different configurations or samples which would be called experiments . During each experiment several pulses at different powers for the same configuration would be labelled tests .

Example 2: A campaign for tensile testing would occur over several days with multiple different specimens tested called experiments . If one of these samples was tested multiple times (e.g. load unload in the elastic regime for accurate modulus measurement) then these would be called tests .

Meta-Data & Data Summary File

This is the over-arching data that describes the data, how it was collected and its structure. All campaigns will need to capture the meta-data required to reproduce all data from scratch. This will be captured by filling in a template file called a data-summary.json or DS. An example DS for an experiment on HIVE can be found here: data-summary-example.json with an example template here: data-summary-template.json

The DS is in human readable JSON format and can be edited using a simple text editor. It is recommended that the file is edited using a code highlighting editor to ease readability such as VSCode.

Key Principles

      As soon as possible after collection data should be immediately backed up to a cloud storage service (e.g. one drive or the PowerScale) even if the data does not yet conform to this guide. Unstructured data can be placed in the ScratchSpace directory.

      A DS must be completed that captures all relevant meta-data for the experiment. The DS must contain all information required to reproduce the campaign and associated data from scratch.

      All data should be stored in open data formats whenever possible.

      Data volume permitting human readable formats are preferred, e.g., *.csv, *.xml

      Where data must be stored in a proprietary format the software and version required to read the files must be detailed in the DS

      The data should be stored in a consistent folder structure as detailed below.

      All raw experimental data must be stored such that any post-processing is fully reproducible.

      All calibration information should be stored with the raw data and detailed in the DS.

      All static data traces for noise floor analysis must be stored.

      If processed experimental data is stored all software that was used to process the data must be detailed in the DS.

      If bespoke code was used to process the data this must also be noted in the DS and a link to the version of the code that was used should be provided.

DIC Data Specifics

At minimum all raw images and calibration information required to process the data using DIC software must be stored.

MatchID

Note: The MatchID grabber stores a *.xaml and *.csv file with each burst of images. These files contain essential meta-data and should not be deleted. Other systems may use other files or require the user to input metadata themselves.

The raw images that must be collected and stored include:

      All calibration images and information about the calibration target

      At least 50 images or pairs for stereo are required - more are preferred

      Even for 2D DIC a pixel to length conversion is required

      Optional: it is good practice to calibrate before/after an experiment so two calibration folders should be present per experiment

      Static images before any testing is done for noise floor analysis

      At least 100 image or pairs for stereo are required

      Optional: if possible rigid body translation tests can provide a more robust estimate of the noise floor and can help to analyse possible pattern quality issues or systematic errors. This is easy to do in a tensile test machine by releasing one of the grips.

      All raw images taken during experiments / tests

If the raw data is processed with DIC software then the stored output must conform to the principles in the DIC good practice guide. A scratchspace directory will be used for data that is being processed but once a final processed version is created it must conform to the DIC good practice guide which can be found here:

      International Digital Image Correlation Society, Jones, E.M.C. and Iadicola, M.A. (Eds.) (2018). A Good Practices Guide for Digital Image Correlation. DOI: 10.32720/idics/gpg.ed1

Raw Data: Directory Structure

MatchID

An example directory structure for a HIVE experiment using DIC can be found here: HIVE-0001. The directory structure is described below using the rules outlined here:

      D indicates a directory and F indicates a file.

      Multiple capital letters indicate a number used to identify the campaign / experiment / test. Example: ExpXX_DICYY could be Exp01_DIC02 which would be the data from DIC system 2 and experiment 1.

      Italics indicate a text identifier to describe the campaign / experiment / test.

      Each level of indentation indicates that the indented directories or files are located within that directory.

      Each directory can be repeated multiple times to account for the number of experiments / DIC systems and other diagnostics used.

      Additional files that are required to describe the structure of the data can be placed in the top directory with the DS, for example a HIVE experiment logbook.

      The ExpXX_PointSensors directory is intended to capture point sensor data that is recorded separately to the DIC DAQ systems which will be contained in a *.csv in each DIC test directory with the images. An example would be the point sensor traces recorded by the HIVE DAQ system or the data recorded by the tensile test machines DAQ during a materials test.

      The word identifier in italics is used in some of the directory names below and is an optional identification string to describe the conditions under which the data was taken. For example: a test directory for a HIVE experiment might use the power level as an identifier and a materials test could use the specimen number. Regardless of the use of an identifier the contents of all directories will be described in the DS file.

      Text contained with # # indicates a note, description or comment

      A * is a wildcard and could be used to replace anything so a *.tiff is any file name with extension .tiff.

The directory structure for the raw DIC data (no post processing) is as follows:

      D: RigOrTestType-NNNN-Raw

      F: data-summary-rigortesttype-NNNN

      D: DICYY_CalibCC

      F: Image files in raw format normally *.tiff, MatchID will name these Image_IIII_C.tiff where IIII is the image number and C is the camera number

      F: Calibration file in human readable format, *.caldat. File name should be CalCC.caldat.

      F: File containing image path, time stamp and traces from any connected and configured DAQ system in *.csv

      F: File containing camera make, number of pixels and exposure time (in *.xaml format for MatchID)

      D: DIC-IRYY_CalibCC

      #See above calibration directory example but include IR calibration images from IR camera and main DIC camera#

      D: ExpXX_DICYY

      F: Calibration file in human readable format, *.caldat

      D: CheckNN_identifier

      F: Image files in raw format normally *.tiff

      F: CSV file containing image path, time stamp and traces from any connected and configured DAQ system in *.csv

      F: Meta-data file containing camera make, number of pixels and exposure time (in *.xaml format for MatchID)

      D: StatRefNN_identifier

      F: Image files in raw format normally *.tiff

      F: CSV file containing image path, time stamp and traces from any connected and configured DAQ system in *.csv

      F: Meta-data file containing camera make, number of pixels and exposure time (in *.xaml format for MatchID)

      D: TestTTT_identifier

      F: Image files in raw format normally *.tiff

      F: CSV file containing image path, time stamp and traces from any connected and configured DAQ system in *.csv

      F: Meta-data file containing camera make, number of pixels and exposure time (in *.xaml format for MatchID)

      D: ExpXX_DIC-IRYY

      #See above DIC data structure but includes images from IR camera in each directory#

      D: ExpXX_PointSensors

      F: CSV files containing point sensor traces with descriptive column headings containing the sensor type, identification string and units of measure. Information on the point sensors should be captured in the DS so the *.csv is easy to interpret.

      D: ScratchSpace

      #Any directory or file structure can be used here for temporary data processing#

      #MatchID places processed data files in the same directory as the images, image directories must be copied here before processing to ensure the raw data directories are preserved.#

      #Processed data must be stored in an open format if possible, MatchID has the capability to export *.csv so this should be used. The *.dat and *.3dat files may also be retained for ease of processing.#

Processed Data: Workflow

Processed digital image correlation data can come from a variety of sources. This could include a single process of a real experiment through to a performance analysis on a simulated experiment (images created using synthetic image deformation). The flow chart below outlines the common processing workflow paths for MatchID and the associated file types. The directory structure will derive from these workflow paths.

Figure 1: Flow chart showing the processing routes for digital image correlation data coming from MatchID

Processed Data: Directory Structure

The directory structure for processed data given below follows a similar convention to that described for the raw data above. The processed data directory structure below uses the following capital letters to mean the following:

      W = strain window

      Z = number of data points

      Q = type of interpolation

      S = subset size

      N = step size

      C = camera number

      M = number of time step

      X,Y,I,J,K = an integer used for numbering directories, images etc

Note that it is important that any *.m3inp or *.*inp files from MatchID are stored with the output data as these containing a copy of all processing parameters required to reproduce the data.

The directory structure for the processed data is as follows, an example can be found here (TODO insert link). Note that the high-level directory identifier should correspond to the raw data that was used to create the processed data:

D: RigOrTestType-NNNN-ProcessPP

       D: ExpXX_DICYY_real_performance_analysis

o      D: SS_NN_affine, SS_NN_quadratic #directories for each subset parameter combination#

     D: SWZZ_QK #Sub-directories for each strain window combination#

       F: Image_IIII_C.tiff.csv

       F: Image_IIII_C.tiff_SWZZ_QK.3dat

     F: Image_IIII_C.tiff.3dat

     F: Image_IIII_C.tiff_SWZZ_QK.csv

       D: ExpXX_DICYY_real_single_process

o      D: Some classification of subfolders

     F: Image_IIII_C.tiff_u.csv

     F: Image_IIII_C.tiff.3dat

     F: TestXXX_YY-optional-identifier.H5

     F: TestXXX_YY-optional-identifier.m3inp ##

       D: ExpXX_DICYY_simulated_performance_analysis

o      D: Noise_NN #Many folders with different noise cases e.g. Noise_1#

     D: folders for each parameter combination SS_NN_affine, SS_NN_quadratic, subset size, step size

       D: SWZZ_QK #strain window num of data points, type of interpolation#

o      F: Image_IIII_C_Numerical_M_Y.def.csv

o      F: Image_IIII_C_Numerical_M_Y.def_SWZZ_QK.3dat

       F: 3dat files Image_IIII_C_Numerical_M_Y.def.3dat

       F: csv files Image_IIII_C_Numerical_M_N.def_SWZZ_QK.csv

     F: *.mipa MatchID input file for performance analysis

     F: BestDICParamFile.txt

     F: #Finite element model input files# - Node_Coord.txt and Node_Strain.txt for ANSYS, Exodus for MOOSE

     F: Image_IIII_C_Numerical_M_Y.def

     F: Many m3inp files VSGAnalysis_SS_NN_affine.m3inp

o      D: NoNoise

     D: SS_NN_quadratic

       D: SWZZ_QK

o      F: Image_IIII_C_Numerical_M_Y.def.3dat

o      F: Image_IIII_C_Numerical_M_Y.def.csv

o      F: *.m3inp #Text file describing DIC processing parameters#

       F: Image_IIII_C_Numerical_M_Y.def.3dat

       F: ImageDefwithoutNoise.mti3d

       F: *.m3inp #Text file describing DIC processing parameters#

     F: BestDICParamFile.txt

     Node_Coord.txt

     Node_Strain.txt

       D: ExpXX_DICYY_simulated_single_process

o      D: Many folders with different noise cases Noise_1

     D: SS_NN_quadratic

       D: SWZZ_QK

o      F: Image_IIII_C_Numerical_M_Y_def.csv

o      F: Image_IIII_C_Numerical_M_Y.def_SWZZ_QK.3dat

       F: Image_IIII_C_Numerical_M_Y.def.3dat

       F: Image_IIII_C_Numerical_M_Y.def.SWZZ_QK.csv

     F: *.mipa MatchID input file

     F: BestDICParamFile.txt

     F: Image_IIII_C_Numerical_M_Y.def

     F: Image_IIII_C_Numerical_M_Y.def.3dat

     F: #Finite element model input files# - Node_Coord.txt and Node_Strain.txt

     F: *.m3inp #Text file describing DIC processing parameters#

     F: VSGAnalysis_SS_NN_quadratic.m3inp

o      D: NoNoise

     D: SS_NN_quadratic

       D: SWZZ_QK

o      F: Image_IIII_C_Numerical_M_Y.def.3dat

o      F: Image_IIII_C_Numerical_M_Y.def.csv

o      Job.m3inp

       F: Image_IIII_C_Numerical_M_Y.def.3dat

       F: ImageDefwithoutNoise.mti3d

       F: Job.m3inp

     F: BestDICParamFile.txt

     F: Node_Coord.txt

     F: Node_Strain.txt

 

Data Storage

General Guidelines DIC Data Guide