8 Anatomy of a Neotoma Dataset

8.1 The Minimum Object

Neotoma maintains a number of constraints on tables, sets of required fields and bounded values. For example, a user must define a dataset type within the set of available dataset types in datasettypes. However, it is possible to enter a new dataset with very minimal data should that be required. To help understand what a Neotoma object looks like, we can look at these minimum requirements:

  • From ndb.sites we only need to fill the columns “sitename”, “geog”, defining a (non-unique) place name and a spatial location.
  • The siteid then links to ndb.collectionunits where we have the required field handle. The handle must be unique across the database, but this is a legacy constraint.
  • A dataset is linked with the collectionunitid and requires a datasettypeid from the valid ndb.datasettypes.
  • Any analysis unit has no required fields, only a link to the collectionunitid.
  • We require a link to the ndb.constituentdatabases, so a valid database.
  • We require a value, which is linked through the analysisunitid and sampleid and tied to a taxonname in the ndb.taxa table.

A representation of the tables and attributed required to generate the “minimum” dataset in Neotoma.

So, at minimum we have a single observation of a “taxon” from a collection unit at a site. This data gets entered into a constituent database as a particular dataset type. Thus we might consider:

  • A grain of Taxus pollen, in a modern pollen dataset from a collection unit HERMES at a site Límni Zirelia in Greece that is included in the European Pollen Database.
  • An obervation of pH collected from collection unit DIRTY from a roadside pond at a site called Paul's Truck Stop in Canada that is included in the NANODE database.

It should be clear from these extremely simple objects that more metadata is needed to fully explain a data record, however, with the current constraints in Neotoma, this is all that is fundamentally required to add a record to Neotoma, and will be the fundamental unit of record for any DOI minted of that dataset.