- I have data and want to publish to OBIS - what do I do?
- Why is it important to share and format data?
- How do I handle sensitive data?
- Where can I make suggestions for improvements on this Manual?
- Where can I find OBIS related training videos?
- What are the responsibilities of node managers?
- Where can I find marine datasets linked to the OBIS network by the GBIF registry, that now require endorising?
- Where can I learn about “Darwin Core”?
- I am having trouble understanding how Core and Extension tables relate to each other
- How does the OBIS format avoid redundancy in data
- How are extension tables (e.g. eMOF, occurrence) linked with the core table?
- What is the difference between Occurence Core and Event Core?
- Is there a checklist of all required Darwin Core fields for OBIS?
- How does data flow in OBIS?
- What should I do if I do not have the data for required fields by OBIS?
- How do I construct an eventID?
- How do I construct occurrenceID?
- What data goes into Occurrence core (or extension) and how do I set up this file?
- How do I set up an Event core table?
Do I have to provide decimalLatitude and decimalLongitude for the Event and Occurrence tables?The answer may depend on your dataset structure, but generally, no. If you have Event core, then you do not need to repeat location information in the Occurrence table (but you can if you’d like). If you are using Occurrence core, then location information must be provided in the Occurrence table.
- What data goes into extendedMeasurementOrFact and how do I set it up?
- How do I format dates?
- How do I handle historical data?
- How do I convert coordinates to decimal degrees?
- How do I convert different geographical formats to WGS84?
- How do I compile acoustic, imaging, or other multimedia data for OBIS?
- How do I compile habitat data for OBIS?
- How do I compile tracking data for OBIS?
- How do I compile DNA and genetic data for OBIS?
How do I document occurrences from unknown species, those new to science, or those with temporary names? e.g. Eurythenes sp. DISCOLL.PAP.JC165.674
Occurrences unknown or new to science should be documented according to recommendations by Horton et al. 2021. You should populate thePlease avoid simple alphanumeric codes (i.e. Eurythenes sp. 1, Eurythenes sp. A). Similar to creating
scientificNamefield with the genus, and in
identificationQualiferprovide the ON sign ‘sp.’. However you must also indicate the reason why species-level identification is unavailable. To do this, supplement ‘sp.’ with either stet. (stetit) or indet. (indeterminabilis). If neither of these are applicable, (e.g. for undescribed new species), add a unique taxon identifier code after ‘sp.’ to
identificationQualifer. For example Eurythenes sp. DISCOLL.PAP.JC165.674.
occurrenceIDs, you should strive to provide more complex and globally unique identifier. Identifiers could be constructed by combining higher taxonomic information with information related to a collection, institution, museum or collection code, sample number or museum accession number, expedition, dive number, or timestamp. This ensures namestrings will remain unique within a larger repositories like OBIS. It is also recommended to include these temporary names on specimen labels for physical specimens.
- How do I map Measurement or Fact terms in OBIS with preferred BODC vocabulary?
- I can’t find a suitable vocabulary, what do I do? How do I request a new vocabulary term?
Should I use taxon-specific P01 codes to populate for measurementTypeID? e.g. http://vocab.nerc.ac.uk/collection/P01/current/A15985A1No. You should never use taxon-specific P01 codes. This is because the taxa are already identified in the Occurrence table, in the fields
- How should I match raw data fields with Darwin Core terminology?
- How do I use the WoRMS taxon match tool?
- Can I fetch a full classification for a list of species from WoRMS?
- What do I do if my scientificName does not return a match from WoRMS?
- Where can I find DNA sequences published in OBIS?
Is there a template generator I can use to help create my Event, Occurrence, and eMoF tables?
Yes. There is an Excel template generator developed by Luke Marsden & Olaf Schneider as part of the Nansen Legacy project. Note this template generator is aimed at GBIF users, so make to account for and include required OBIS terms.
There is also this Excel to Darwin Core macro tool developed by GBIF Norway you can use to help generate templates.
- How do I georeference locations, including text-based descriptions?
- How do I do data quality control?
- What are the OBIS quality control flags?
- Why are certain records dropped in OBIS?
- What do I do when I am uncertain about the:
- What do I do with freshwater species that are part of my marine dataset?
- How do I add my data to the OBIS database?
- What metadata do I have to provide? Where? How?
- How do you know which license to choose?
- How do I access the IPT?
- How do I use the IPT?
- Are there instructions for IPT administrators?
- How do I add DOI to my dataset?
- How do I publish to both GBIF and OBIS?
- How do I update my already published dataset?
- How do I download data from OBIS?
How do I load the full (.csv) export of OBIS data?
Loading the entire OBIS dataset uses a lot of memory and is probably not feasible on most desktop computers. You have a few potential options depending on the use case: i) process the data in smaller batches, or ii) load the dataset into a local database such as SQLite and use SQL queries to analyze the dataOtherwise, we recommend you use the parquet download which is available here, instead of the CSV. Then in R, you can use the
arrowpackage to work with parquet files. We also have a short tutorial on working with parquet files in R here, with an example application of this approach here (see first code block).
- How can I use R to access OBIS data?
- How do I use the OBIS API to fetch and filter data?
- How do I contact the data provider?
- How can I cite OBIS datasets and downloads?
- What are the definitions of the field names in the downloads generated by OBIS?
How do I obtain a taxon checklist for an area?
There are a few possible ways to obtain a taxon checklist for a given area. We will obtain a checklist of species in the Albain EEZ as an example. To do this we will create a bounding box around our area of interest, and then apply filters to simplify the geometry.
library(mregions) library(dplyr) library(robis) library(sf) #obtain Albanian EEZ as sf <- mr_shp(key = "MarineRegions:eez", filter = "Albanian Exclusive Economic Zone", maxFeatures = NULL) geom #get WKT for the bounding box <- st_as_text(st_as_sfc(st_bbox(geom)), digits = 6) wkt #fetch occurrences for bounding box <- occurrence(geometry = wkt) %>% occ st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = 4326) #filter using geometry <- occ %>% occ_filtered filter(st_intersects(geometry, geom, sparse = FALSE)) %>% as_tibble() %>% select(-geometry) #get taxa <- occ_filtered %>% alb_taxa group_by(phylum, class, order, family, genus, species, scientificName) %>% summarize(records = n())
The dates look unusual in the download file. What are these, how do I convert them, and/or how do I obtain separate elements from them (e.g. month)?
The values in
date_endare unix timestamps which have been calculated from the ISO date in the
eventDatecolumn. We can convert these numerical values to dates using the formula below.
If, when you apply this formula, you still see numbers, you will need to set the cell formatting to Date. Once you have dates, you can obtain, e.g. months for seasonal analyses using:
You can also use this tool to convert timestamps.
How do I filter by or obtain trait information for OBIS data (e.g. all benthic organisms)?
Currently, it is not possible to filter OBIS data by trait. To do this, we recommend using the traits database of the World Register of Marine Species. For example, searching by “functional group”, you can specify benthos, plankton, nekton, etc.