Metadata

Developing structured metadata to link objects, data, and scholarship

For complex digital objects

Complex digital objects (CDOs) are increasingly replacing single image scans as the mainstay of digital libraries and as objects of scholarly study. Advanced non-invasive imaging techniques – 3D scanning, spectral photography, and micro-computed tomography, for example – can now be applied to practically any type of cultural object and combined in myriad ways to create complex digital representations with many disparate parts. In addition, emergent technologies like virtual unwrapping and artificial intelligence make it possible to create “born digital” objects displaying previously invisible or unseen features, such as written letters and brush strokes.

To engender confidence in such complicated digital representations as objects of scholarly study, transparent metadata that describes and depicts the set of algorithmic steps and file combinations used to create them are crucial. In exploring ways to document this digital provenance chain and then support the dissemination of the metadata in a clear, concise, and organized way, we settled on the use of the Metadata Encoding Transmission Standard (METS)[1] to organize the metadata for PHerc. 118.

METS Editor

The METS Editor is a prototype tool for adding metadata to an existing METS file and is part of our METS Tools software project. The User Interface (UI) is specifically designed for users with little to no experience with XML markup. Rather, the UI allows one to enter metadata into text fields while auto-generating the XML in the background according to a predefined schema. For project purposes, the goal of this tool is to facilitate metadata documentation after the initial image capture of Herculaneum papyri fragments.

Adding METS document data

Adding METS document data.

Adding descriptive metadata

Adding descriptive metadata.

METS Editor is built in Python using PySide6. Source code and prebuilt binaries for all major operating systems are publicly available for download under the terms of the GNU AGPLv3 license:

Get started with METS Tools

Additionally, Python users can install METS Tools (including the METS Editor) from PyPI:

python -m pip install educelab-mets
METSEditor  # run METS Editor from the Python environment

For complex processing pipelines

In addition to the complex relationships between digital representations of an object, the process by which a single representation is formed can itself be fraught with complexity. The conversion from raw sensor data into a final digital product is non-trivial, and the quality of the result is often highly dependent upon the proper selection of algorithmic parameters at multiple points in a processing pipeline. To make it easier for researchers to link every digital object to a traceable processing pipeline, and to automatically track the parameters used in those processing pipelines, we developed the Structured Metadata Engine and Graph Objects Library (smgl).

Smgl is a C++14 library for creating custom dataflow pipelines that automatically track inputs, outputs, parameters, and intermediate data files. When visualized, these pipelines form complex directed graphs where nodes of the graph represent computation and edges symbolize data “flowing” through each algorithmic step:

Visualization of a complex smgl pipeline

Visualization of a complex smgl pipeline (Register 3D) which computes the alignment of multiple photographs to a 3D mesh. At the top, data is loaded from existing METS digital objects and passed through various computational stages before produing a new digital object at the bottom.

Each computational step stores its metadata in an open, human-readable JSON format that is compatible with our METS profile and which can be easily inspected without custom software. Additionally, intermediate data files can be archived alongside pipeline metadata:

Example JSON metadata for a computational step

Example JSON metadata for a computational step.

Inspecting intermediate data files from a computational step

Inspecting intermediate data files from the pipeline.

The source code for smgl is publicly available for download under the terms of the GNU GPLv3 license:

Get started with smgl