4.7 Choosing vocabularies for your dataset

Content

4.7.1 Map data fields to Darwin Core

There are many possible ways of setting up your datasheets, and if you are new to OBIS you likely did not use controlled Darwin Core (DwC) or BODC vocabulary before samples were collected. In mapping your data fields to DwC we recommend documenting your choices so you have a reference to go back to should the need arise. In such a document you should take notes on the choices you made, as well as any actions you had to take (e.g. separate one column into many, convert dates or coordinates, etc.).

For example, a DwC mapping reference table could look like the following:

Verbatim field name Mapped DwC term Actions taken Notes
date eventDate convert dates to ISO
coordinates decimalLongitude, decimalLatitude convert ddmmss to decimal degrees, separated one column into 2 for longitude and latitude put original coordinates into verbatimCoordinates

In order to help you map your data to DwC terms, we have provided the table below which outlines some common data fields, their associated Darwin Core vocabulary, and which data table the field is likely to go in:

Common Raw Terms DwC Field Data table
Date, Time eventDate Event, Occurrence
Species, g_s, taxa scientificName Occurrence
Any biotic/abiotic measurements* measurementType, measurementValue, measurementUnit* eMoF
Depth maximumDepthInMeters or minimumDepthInMeters Event, Occurrence
Lat/Latitude, Lon/Long/Longitude, dd decimalLatitude, decimalLongitude Event, Occurrence
Sampling method samplingProtocol eMoF
Sample size, N, #, No. sampleSizeValue eMoF
Location locality Event
Presence, absence occurrenceStatus Occurrence
Type of record/ specimen basisofRecord Occurrence
Person/ people that recorded the original Occurrence recordedBy Occurrence
OrcID of person/ people that recorded the original Occurrence recordedByID Occurrence
Person/ people that identified the organism identifiedBy Occurrence
OrcID of person/ people that identified the organism identifiedByID Occurrence
Data collector, data creator recordedBy Event, Occurrence
Taxonomist, identifier identifiedBy Occurrence
Record number, sample number, observation number occurrenceID (either ID or incorporated into ID) Occurrence

Note that mapping abiotic/biotic measurement fields (sex, temperature, abundance, lengths, etc.) will occur within the extendedMeasurementOrFact extension. Here this data will go from being a separate column to being condensed into the measurementType and measurementValue fields.

The obistools R package also has the map_fields function that you can use to map your dataset fields to a DwC term.

4.8 Map eMoF measurement identifiers to preferred BODC vocabulary

4.8.1 MeasurementOrFact vocabulary background

The MeasurementOrFact terms measurementType, measurementValue, and measurementUnit are completely unconstrained and can be populated with free text. While free text offers the advantage of capturing complex and as yet unclassified information, there is inevitable semantic heterogeneity (e.g., of spelling, wording, or language) that becomes a challenge for effective data interoperability and analysis. For example, if you were interested in finding all records related to length measurements, you would have to try to account for all the different ways “length” was recorded by data providers (length, Length, len, fork length, etc.).

Use the OBIS Measurement Type search tool to see the diversity of measurementTypes that exist across published datasets in OBIS. Note that any measurementTypeIDs listed in this tool are solely for consultation purposes. In some cases codes may have been incorrectly chosen for the associated measurementType. You should always choose measurementTypeIDs based on your own data and the guidelines in this manual.

The 3 identifier terms measurementTypeID, measurementValueID and measurementUnitID are used to standardize the measurement types, values and units.

These three terms should be populated using controlled vocabularies referenced using Unique Resource Identifiers (URIs). For OBIS, we recommend using the internationally recognized NERC Vocabulary Server, developed by the British Oceanographic Data Centre (BODC). This server can be accessed through:

Controlled vocabularies are incredibly important to ensure data are interoperable - readable by both humans and machines and that the information is presented in an unambiguous manner. Vocabulary collections like NERC NVS2 compile vocabularies from different institutions and authorities (e.g., ISO, ICES, EUNIS), allowing you to map your data to them. In this way, you could search for a single measurementTypeID and obtain all related records, regardless of differences in wording or language used in the data.

Each vocabulary “term” in NVS is a concept that describes a specific idea or meaning. For consistency, we refer to individual vocabularies in NVS as concepts. Concepts within NVS are organized into collections that group concepts with commonalities (e.g. all concepts pertaining to units). Sometimes collections contain concepts that are deprecated. Terms can be deprecated due to duplication of concepts, or when a term becomes obsolete. You should not use any deprecated concepts for any measurement ID. Deprecated concepts can be identified from lists on NVS because their identifier will have a red warning symbol, and the page for the term itself will indicate the concept is deprecated in red lettering. Unfortunately, there is currently no notification system in place to automatically warn you if a previously used concept has become deprecated. We recommend occasionally confirming that the concepts you or your institution use are still available for use.

Guidelines for populating each mesaurement ID are described below.

4.8.1.1 Guidelines to populate measurementUnitID

The measurementUnitID field is the easiest measurement ID field to populate. It is used to provide a URI for the unit associated with the value provided to measurementValue (e.g. cm, kg, kg/m2). This field should be populated with concepts from the P06 collection, BODC-approved data storage units. Documentation for this collection can be found here.

The entire vocabulary list, including deprecated terms can be found at http://vocab.nerc.ac.uk/collection/P06/current. However, we strongly recommend using https://www.bodc.ac.uk/resources/vocabularies/vocabulary_search/P06/ to avoid potentially selecting deprecated terms.

Some examples of measurementUnits and the associated measurementUnitID include:

4.8.1.2 Guidelines to populate measurementValueID

The measurementValueID field is used to provide an identifying code for measurementValues that are non-numerical (e.g. sampling related, sex or life stage designation, etc.). Note: it is NOT used for standardizing numeric measurements!

Unlike measurementUnitID, there is more than one collection which may be used to search for and use concepts from. The collection is dependent on which type of measurementValue you have. See the table below for some common, non-exhaustive examples.

Type of measurementValue Collection Collection Documentation Complete Vocabulary List
Sex (gender) S10 https://github.com/nvs-vocabs/S10 <http://vocab.nerc.ac.uk/collection/S10/current/
Lifestage S11 https://github.com/nvs-vocabs/S11 http://vocab.nerc.ac.uk/collection/S11/current/
Sampling instruments and sensors (SeaVoX Device Catalogue) L22 https://github.com/nvs-vocabs/L22 http://vocab.nerc.ac.uk/collection/L22/current
Sampling instrument categories (SeaDataNet device categories) L05 https://github.com/nvs-vocabs/L05 <http://vocab.nerc.ac.uk/collection/L05/current
Vessels (ICES Platform Codes) C17 - http://vocab.nerc.ac.uk/collection/C17/current
European Nature Information System Level 3 Habitats C35 - https://vocab.nerc.ac.uk/collection/C35/current/

You can also populate measurementValueID with references to papers or manuals that document the sampling protocol used to obtain the measurement. To do this you should use either:

4.8.1.3 Guidelines to populate measurementTypeID

4.8.1.3.1 The P01 Collection

One of the more important collections for OBIS is the P01 collection.

Important note! P01 codes are required for the measurementTypeID field.

The P01 is a large collection with >45,000 concepts. Each concept within this collection is composed of different elements that, together, construct a label you can use for a measurement type. Frequently, concepts from the P01 collection are referred to as a “P01 code”. P01 codes are used to populate the measurementTypeID field.

It is important to know that a semantic model, shown below, underlies each P01 code and the elements that compose them. There are 5 potential elements in this semantic model that, together, unambiguously describe a measurement type. See the P01 wheel for example of how these elements relate and combine to make one P01 code.

Elements of the semantic model for P01 codes
Elements of the semantic model for P01 codes
  • Property/attribute: the measurement or observation of either an object of interest or a matrix, or both
  • Object of interest: a chemical object, a biological object, a physical phenomenon, or a material object
  • In relation to: how the measurement is related to the environment
  • Environmental matrix: what environment the measurement is in (e.g. water body, seabed); needed for most environmental measurements, but may not be necessary for e.g. biological measurements
  • Method: any specific methods used that are important to interpret the measurement

Note: Not every element is required, but it is important to think about each piece of the model and how it may or may not apply to your measurement. More details about this are described below in the mesaurementTypeID section.

You can use codes from other collections (e.g. P06, Q01) for measurementValueID and measurementUnitID fields, but for measurementTypeID you must always use a code from the P01 collection (limited exceptions, see below).

4.8.1.3.2 Selecting P01 codes for measurementTypeID

When selection P01 codes, it is important to understand that each element within a P01 code is meant to describe an aspect of the measurement: what is the measurement, what is the object or entity being measured, in what environment was the measurement taken, by what kind of methods, etc.? By taking together all these elements, we are able to have a unique and specific description to differentiate one measurement from another. More documentation about the P01 code and the semantic model it is based on can be found here.

The P01 collection is found here and can be searched through the NERC vocabulary server.

You may notice when searching for measurement types related to an occurrence that specific taxonomic codes are available to you, e.g., abundance of Notommata. For OBIS, all P01 codes should be generalized - i.e. do not select species-specific codes. Instead only choose codes for “biological entities specified elsewhere” when the measurement is related to an occurrence record.

There are several ways of searching for a P01 code, but we highly recommend using the SeaDataNet P01 Facet Search. You may notice when searching for measurement types related to an occurrence that specific taxonomic options are available to you, e.g., abundance of Notommata. For OBIS, all P01 codes should be generalized - i.e. do not select species-specific codes. Instead, only choose codes for “biological entities specified elsewhere”! This is due to the Darwin Core Archive structure - taxonomic identification is already specified in the Occurrence table, but measurements are recorded in the ExtendedMeasurementOrFact table.

When you are comfortable and understand P01 codes, you can also use the BODC Vocabulary Builder or simply search for terms directly on the NERC Vocabulary Server.

For measurementTypes related to sampling instruments and/or methods attributes, see the Q01 collection:

Use the following decision tree to help you select P01 codes for biological, chemical, and/or physical measurements. See the OBIS YouTube Vocabulary series for guidance on how to use this tree and examples for different types of measurementTypeIDs. Note that the measurementTypeID:sampling measurements branch is still in development by members of the OBIS Vocabulary team.

What is the measurement property?
What is the me…
No
No
What’s the measurement-matrix-relationship?
What’s the measurement-…
What is the matrix?
(note: most of the time matrix is not needed for biological data)
What is the matrix…
Were any methods used to obtain the measurement?
Were any methods used…
See S26
or:
See S26…
S03 for Sample preparation
S03 for Sample preparation
S04 for Analytical methods
S04 for Analytical methods
S05 for Data processing methods
S05 for Data processing methods
Yes
Yes
Biological
entity specified elsewhere
[no sub-component]
Biological…
What is the object of interest that is being measured? Is there a subcomponent?
What is the object…
Is it a statistically derived 
parameter? (e.g. mean, max)
Is it a statistical…
Optional
Optional
Strongly recommended
Strongly recomm…
Pre-determined
Pre-determined
Terms in S07 if interested
Terms in S0…
What type of measurement do you have?
What type of measurement do yo…
Physical or abiotic (e.g. temperature, salinity)
Physical or abiotic…
Chemical substance
(e.g. concentration, flux, sedimentation rate)
Chemical substance…
Sampling measurements
(e.g. device, protocol)
Sampling measurement…
ACTION: navigate to SeaDataNet facet search
ACTION: navigate to Sea…
ACTION: navigate to SeaDataNet facet search
ACTION: navigate to SeaDa…
measurementTypeID
measurementTypeID
measurementUnitID
measurementUnitID
measurementValueID
measurementValueID
NERC
P06 collection
NERC…

Platform or Vessel Codes & Names
Platform or Vessel Codes & Name…
Sex
Sex
Life stage
Life stage
Biological
(e.g. length, weight, abundance)
Biological…
C17
(ICES names)
C17…
S10
S10
S11
S11
How to use these decision trees: 
These trees are meant to help you identify the important elements within your measurement so that you find the best suited vocabulary code. For all branches related to measurementType, the tree should be used in conjunction with the SeaDataNet facet search to obtain P01 codes.

You will notice other collections (e.g. S07) indicated as you follow along the branches. You do not need to go to these collections to identify sub-elements, these are provided to indicate where that type of information is found. Ultimately you should select a code that best captures your measurementType, measurementUnit, or measurementValue.
How to use these decision trees:…
Suggested tool/collection/action
Suggested tool/coll…
What is the chemical substance being measured
What is the chemical…
What is the measurement property?
What is the measurem…
START
START
Which identifier are you looking for?
Which identifier are yo…
What type of (non-numeric) value is it?
What type of (non-numeric) value…
What is the measurement property? 
What is the measurem…
No
No
Yes
Yes
Is it a statistically derived 
parameter?
Is it a statistica…
What’s the measurement-matrix-relationship?
What’s the measurement-ma…
What is the matrix?
What is the matrix?
S26 or for more specific ranges see:
S26 or for mo…
Sphere sub-group
S22
Sphere sub-group…
Phase
S23
Phase…
Phase sub-group
S24
Phase sub-group…
What is the physical entity being measured?
What is the physical…
The environmental matrix?
The environmental…
Other physical entity
(S29)
Other physical ent…
ACTION: navigate to SeaDataNet facet search
ACTION: navigate to…
Biological
entity specified elsewhere
[with sub-component]
Biological…
There are only 3 matrices relevant for OBIS biological data
  • Water body
  • Bed
  • Sediment
Look in S26 or S21 for more details
There are only 3 matrices relevant for OBIS…
Terms in S06 if interested
Terms in S0…
Terms in S07 if interested
Terms in S0…
Terms in S27 if interested
Terms in S2…
Terms in S02 if interested
Terms in S0…
Terms in S02 if interested
Terms in S0…
Sphere
S21
Sphere…
Were any methods used to obtain the measurement?
Were any methods used…
See S26
or:
See S26…
S03 for Sample preparation
S03 for Sample preparation
S04 for Analytical methods
S04 for Analytical methods
S05 for Data processing methods
S05 for Data processing methods
No
No
Yes
Yes
Is it a statistically derived 
parameter?
Is it a statistica…
Terms in S07 if interested
Terms in S0…
Were any methods used to obtain the measurement?
Were any methods used…
See S26
or:
See S26…
S03 for Sample preparation
S03 for Sample preparation
S04 for Analytical methods
S04 for Analytical methods
S05 for Data processing methods
S05 for Data processing methods
S26 or for more specific ranges see:
S26 or for mo…
Sphere sub-group
S22
Sphere sub-group…
Phase
S23
Phase…
Phase sub-group
S24
Phase sub-group…
Sphere
S21
Sphere…
Terms in S29 if interested
Terms in S2…
What’s the measurement-matrix-relationship?
What’s the measurement-ma…
Terms in S02 if interested
Terms in S0…
Could not identify a suitable measurement ID?
Could not identify a…
Add new issue to OBISVocabs repository to request creation of a new vocabulary
 https://github.com/nvs-vocabs/OBISVocabs/issues 
Add new issue to OBISVocabs re…
Biological entity characteristics
Biological entity charact…
Sampling device, instrument, platform/vessel information
Sampling device, instrume…
ACTION: Check search results for appropriate P01 codes, while carefully considering all methods used to obtain the measurement
ACTION: Check search results for appropriate P01 c…
ACTION: Consider the units of your measurements, whether a sub-group was targeted to help you determine the matrix and measurement-matrix relationship.
ACTION: Consider the units of your measurements, w…
ACTION: search for “biological entity specified elsewhere”, then from the Biological Entity (S25) filter, select Biological entity specified elsewhere, including subcomponent where applicable
ACTION: search for “biological enti…
ACTION: search for your measurement and/or select it from “Measurement Property”
ACTION: search for your measur…
ACTION: Consider if your biological measurement has a matrix element
ACTION: Consider if your biolo…
ACTION: Consider if the methods used to obtain the measurement are integral for understanding the measurement
ACTION: Consider if the methods used…
ACTION: If your measurement occurred within a matrix, consider the relationship between the two (e.g. per unit volume of)
ACTION: If your measurement oc…
ACTION: Look in the search results for a P01 code that best matches your data. Use the guiding questions below to help you narrow the search
ACTION: Look in the search results for a…
ACTION: search for the name of the chemical substance in the Free Search box
ACTION: search for the name…
ACTION: search for or identify the measurement from “Measurement Property”. Most likely this will be concentration
ACTION: search for or identify…
ACTION: Consider all methods used to obtain the measurement
ACTION: Consider all methods used t…
ACTION: search for the measurement. Be careful not to apply too many filters, as the search may become too narrow
ACTION: search for the measure…
ACTION: Consider the units of your measurements, whether a sub-group was targeted to help you determine the matrix and measurement-matrix relationship.
ACTION: Consider the units of your measurements, w…
ACTION: click on link for conceptID and copy whole URI into measurement ID field
ACTION: click on link for conc…
Broad habitat description
(e.g. littoral mud) 
European Nature Information system 
Broad habitat description…
C35
C35
Habitat
Habitat
M23 
&
M24
M23…
Terms in S12 if interested
Terms in S1…
UNDER DEVELOPMENT
UNDER DEVELOPMENT
ACTION: navigate to SeaDataNet facet search
ACTION: navigate to Sea…
Behaviour
Behaviour
ICES Behaviour codes
ICES Behav…
HELCOM &
Marine Habitat Classification for Britain and Ireland
HELCOM &…
Morphology
Morphology
S14
S14
L06
(Platform category)
L06…
Colour
Colour
S15
S15
Is the name and model of the sampling instrument known?
Is the name and model of the sa…
Yes
Yes
No
No
L05
(categories)
L05…
L22
(model names)
L22…
Sampling device information
Sampling device information
Text is not SVG - cannot display

4.8.2 Requesting new vocabulary terms

If you have already tried looking for a P01 code and were unable to identify a suitable code for your measurementType you must then request a code to be created. Before doing so, make sure you have not over filtered the search results. Then, to request a new term, your request must be submitted via:

We strongly recommend you use GitHub if you can, as it allows longer-term documentation, and can be relevant for other users who may be interested in the same type of code creation.

Finally, if you are unsure about whether a code fits your specific case, please feel free to ask questions to the Vocab channel on the OBIS Slack.

4.8.2.1 How to Submit a GitHub Vocabulary Request

  1. Navigate to https://github.com/nvs-vocabs/OBISVocabs/issues and click on the New Issue button.
Screenshot of how to request a new vocabulary on Github
Screenshot of how to request a new vocabulary on Github
  1. Click Get started
Screenshot of submitting an issue to Github
Screenshot of submitting an issue to Github
  1. Fill in the title with short details of your request or issue. Then fill in the description. It is recommended to list any existing terms that are similar to your request, or concepts that are sub-components of the request.
Screenshot for how to request a new measurementType on Github
Screenshot for how to request a new measurementType on Github
  1. Example: An issue was created to address difficulties in identifying P01 codes for sex rather than gender. Gender is a concept generally applied to humans, whereas “sex” is more applicable for animals. Thus the request was to either modify the current gender P01 code, or create a P01 code that specifies sex, not gender. At the time the request was issued, when users searched for a P01 term for “sex”, only species-specific terms were available.
Example of previously requested new term on Github
Example of previously requested new term on Github