📚 Guides
Metadata

Metadata

To use the pipeline, you have to specify the metadata of your EM27/SUN network: "How was each instrument set up over time?". The metadata consists of three files locations.json, sensors.json, and campaigns.json.

The API Reference section contains example files for the metadata as well as a complete specification of the schema.

locations.json contains a list of measurement locations. You must assign each of your measurement locations a location_id. For example, we use the location IDs TUM_I, FEL, and so on.

sensors.json contains a list of measurement setups for each of your EM27/SUN systems. You must assign each sensor system - each housing one EM27/SUN - a sensor_id. For example, we use the sensor ids ma, mb, mc, md, and me.

campaigns.json contains a list of campaigns. You can leave this list empty. If specified, the bundles generated by the pipeline will include the campaign ID in an additional column so that you can easily filter the bundles by campaign. For example, we use the campaign ids muccnet, san-francisco, vienna, and so on.

Connecting the Metadata

To configure the pipeline with the metadata, you have two options: Save them locally or store them in a GitHub repository. The latter option is a bit more work to set up, but then the metadata can be edited anywhere and is version-controlled.

Option 1: Local Files

For this option, you can save the three files locations.json, sensors.json, and campaigns.json to the config/ directory of the pipeline directory. Be sure to include all the files - even if you don't have any campaigns, save an empty list to campaigns.json.

Alternatively, you can store these files locally in a location outside of this repository. This is achieved by setting the environment variable ERP_CONFIG_DIR to the full path of the alternate location (ERP is short for EM27 Retrieval Pipeline). Note that this should be the same directory that contains the config.json file. If this environment variable is not set, the default location the pipeline will look for these files is config/. When setting an alternate config directory, it is not necessary to touch the file config/config.template.json (this file is version-controlled).

For example, directly setting the environment variable:

export ERP_CONFIG_DIR=<path-to-local-config-dir>

For example, sourcing a .env file:

export $(grep -v '^#' <path-to-.env-file> | xargs)

Check the set environment variables with export -p.

Option 2: GitHub Repository

1. Create a repository

Use the repository tum-esm/em27-metadata-storage-template (opens in a new tab) as a template to create your own metadata-storage repository. On the top right is a button "Use this template".

The repository has already been configured with a GitHub Actions workflow to test whether the metadata matches the required schema.

2. Connect the repository to the pipeline

In the pipeline configuration file at config/config.json, you have to specify the GitHub repository. If you want to keep the repository private, you can use an access token (opens in a new tab) with read access to this repository.

3. Test the connection

You can test the connection to the repository and the integrity of the data in it by running the integration tests with pytest -m integration.

Reusing this Metadata

We use this metadata in many places: for plotting, data analysis, and so on. You do not have to rewrite the parsing logic for every single project but can use our em27-metadata Python library: github.com/tum-esm/em27-metadata (opens in a new tab).