10 Dataset & Collection Related Tables

10.1 `accumulationrates`

This table represents estimated accumulation rates based on particular chronologies associated with collectionunits. Units are defined within the table. Accumulation rates play an important role in understanding surficial and depositional processes (Bennett2016?), and can be critical for defining relevant priors for Bayesian chronologies (Goring2012?). While accumulation rates can be calculated on-the-fly, Neotoma also stores accumulation rates when reported, however, at present only 11 chronologies have reported accumulation rates.

analysisunitid (foreign key): The identifier for the associated analysis unit.
chronologyid (foreign key)
accumulationrate: The rate at which sediment accumulations within a particular analysis unit, defined (generally) as the length of time required to accumulate some depth of sediment (e.g., yr/cm).
variableunitsid (foreign key): Foreign key referencing variable units, describing the units in which the accumulation rate is reported.

10.2 `aggregatedatasets`

Aggregate Datasets are aggregates of samples of a particular datasettypes.

Some examples:

Plant macrofossil samples from a group of packrat middens collected from a particular valley, mountain range, or other similarly defined geographic area. Each midden is from a different site or collectionunit, but they are grouped into time series for that area and are published as single dataset.
Samples collected from 32 cutbanks along several kms of road. Each sample is from a different site, but they form a time series from 0 – 12,510 ¹⁴C yr BP, and pollen, plant macrofossils, and beetles were published and graphed as if from a single site.
A set of pollen surface samples from particular region or study that were published and analyzed as a single dataset and submitted to the database as a single dataset.

The examples above are datasets predefined in the database. New aggregate datasets could be assembled for particular studies, for example all the pollen samples for a given time slice for a given geographic region.

aggregatedatasetid (primary key): An arbitrary Aggregate Dataset identification number.
aggregatedatasetname: Name of Aggregate Dataset.
aggregateordertypeid (foreign key): Aggregate Order Type identification number. Field links to the aggregateordertypes lookup table.
notes: Free form notes about the Aggregate Order Type.

10.3 `aggregateordertypes`

Lookup table for Aggregate Order Types. Table is referenced by the aggregatedatasets table.

aggregateordertypeid (primary key): An arbitrary Aggregate Order Type identification number.
aggregateordertype: The Aggregate Order Type.
notes: Free form notes or comments about the Aggregate Order Type. The Aggregate Order Types are:
- Latitude: AggregateDataset samples are ordered by, in order of priority, either (1) collectionunits GPSLatitude or (2) the mean of sites.LatitudeNorth and sites LatitudeSouth.
- Longitude AggregateDataset samples are ordered by, in order of priority, either (1) collectionunits GPSLongitude or (2) the mean of sites.LongitudeWest and sites.LongitudeEast.
- Altitude AggregateDataset samples are ordered by sites Altitude.
- Age AggregateDataset samples are ordered by sampleages.Age, where sampleages.SampleAgeID is from aggregatesampleages.SampleAgeID.
- Alphabetical by site name AggregateDataset samples are ordered alphabetically by sites.SiteName.
- Alphabetical by collection unit name AggregateDataset samples are ordered alphabetically by collectionunits.CollUnitName.
- Alphabetical by collection units handle AggregateDataset samples are ordered alphabetically by collectionunits.Handle.

10.4 `collectiontypes`

This table is a lookup table of for types of Collection Units, or Collection Types. Table is referenced by the collectionunits table.

colltypeid (primary key): An arbitrary Collection Type identification number.
colltype: The Collection Type. Types include cores, sections, excavations, and animal middens. Collection Units may be modern collections, surface float, or isolated specimens. Composite Collections Units include different kinds of Analysis Units, for example a modern surface sample for ostracodes and an associated water sample.

10.5 `collectionunits`

This table stores data for Collection Units.

collectionunitid (primary key): An arbitrary Collection Unit identification number.
siteid (foreign key): Site where CollectionUnit was located. Field links to sites table.
colltypeid (foreign key): Type of Collection Unit. Field links to the collectiontypes table.
depenvtid (foreign key): Depositional environment of the CollectionUnit. Normally, this key refers to the modern environment. For example, the site may be located on a colluvial slope, in which case the Depositional Environment may be Colluvium or Colluvial Fan. However, an excavation may extend into alluvial sediments, which represent a different depositional environment. These are accounted for by the Facies of the AnalysisUnit. Field links to the depenvttypes table.
handle: Code name for the Collection Unit. This code may be up to 10 characters, but an effort is made to keep these to 8 characters or less. Data are frequently distributed by Collection Unit, and the Handle is used for file names.
collunitname: Name of the Collection Unit. Examples: Core BPT82A, Structure 9, P4A Test 57. If faunal data are reported from a site or locality without explicit Collection Units, then data are assigned to a single Collection Unit with the name «Locality».
colldate: Date Collection Unit was collected.
colldevice: Device used for obtain Collection Unit. This field applies primarily to cores, for example «Wright square-rod piston corer (5 cm)».
gpslatitude: Precise latitude of the Collection Unit, typically taken with a GPS, although may be precisely measured from a map.
gpslongitude: Precise longitude of the Collection Unit, typically taken with a GPS, although may be precisely measured from a map.
gpsaltitude: Precise altitude of the Collection Unit, typically taken with a GPS or precisely obtained from a map.
gpserror: Error in the horizontal GPS coordinates, if known.
waterdepth: Depth of water at the Collection Unit location. This field applies mainly to Collection Units from lakes.
substrateid (foreign key): Substrate or rock type on which the Collection Unit lies. Field links to the RockTypes table. This field is especially used for rodent middens.
slopeaspect: For Collection Units on slopes, the horizontal direction to which a slope faces measured in degrees clockwise from north. This field is especially used for rodent middens.
slopeangle: For Collection Units on slopes, the angle of slope from horizontal. The slopeangle field is especially used for rodent middens.
location: Short description of the location of the Collection Unit within the site.
notes: Free form notes or comments about the Collection Unit.

10.6 `contextsdatasettypes`

Reporting for the set of variable contexts associated with each datasettype. This table is used to relate dataset type to variable context to provide users with a custom set of terms when entering and uploading data for a particular datasettype. For example, Neotoma Pollen:broken is not available for geochronologic datasets.

datasettypeid (foreign key): Link to the datasettype for a particular variable.
variablecontextid (foreign key): Link to the variable context type.

10.7 `datasetdatabases`

The constituent database to which a dataset belongs.

10.8 `datasetdoi`

The doi for a dataset.

10.9 `datasetpis`

This table lists the Principle Investigators for Datasets.

datasetid (primary key, foreign key): Dataset identification number. Field links to Dataset table.
contactid (primary key, foreign key): Contact identification number. Field links to contacts table.
piorder: Order in which PIs are listed.

10.10 `datasetpublications`

This table lists the publications for datasets.

datasetid (primary key, foreign key): Dataset identification number. Field links to datasets table.
publicationid (primary key, foreign key): Publication identification number. Field links to publications table.
primarypub: Is «True» if the publication is the primary publication for the dataset.

10.11 `datasets`

This table stores the data for Datasets. A Dataset is the set of samples for a particular data type from a Collection Unit. A Collection Unit may have multiple Datasets for different data types, for example one dataset for pollen and another for plant macrofossils. Every Sample is assigned to a Dataset, and every Dataset is assigned to a Collection Unit. Samples from different Collection Units cannot be assigned to the same Dataset (although they may be assigned to aggregatedatasets).

datasetid (primary key): An arbitrary Dataset identification number.
collectionunitid (foreign key): Collection Unit identification number. Field links to the collectionunits table.
datasettypeid (foreign key): Dataset Type identification number. Field links to the datasettypes lookup table.
datasetname: Optional name for the Dataset.
notes: Free form notes or comments about the Dataset.

10.11.1 SQL Example

The following query lists the Dataset Types for the site Vaifanaura’amo’ora. Note the kind of apostrophe used here. Strings with single quotes can be escaped using '' a double single quote. Here we are using an ASCII apostrophe character.

 SELECT dst.datasettype
 FROM ndb.sites AS st 
 INNER JOIN ndb.collectionunits AS cu ON cu.siteid = st.siteid 
 INNER JOIN ndb.datasets AS ds ON ds.collectionunitid = cu.collectionunitid
 INNER JOIN ndb.datasettypes AS dst ON dst.datasettypeid = ds.datasettypeid
 WHERE st.sitename = 'Vaifanaura’amo’ora';

Table 10.1: Displaying records 1 - 10
datasettype
geochronologic
geochemistry
paleomagnetic
insect
diatom
plant macrofossil
phytolith
charcoal
pollen
charcoal

10.11.2 SQL Example

This query lists the plant macrofossil taxa located at sites in Neotoma with a defined plant macrofossil dataset type. We have to apply a second filter to ensure we are only returning taxa that fit within the vascular plant taxonomic group.

SELECT st.sitename,
       st.siteid,
       ds.datasetid,
       array_agg(DISTINCT tx.taxonname)
FROM ndb.sites AS st 
INNER JOIN ndb.collectionunits AS  cu ON           cu.siteid = st.siteid
INNER JOIN        ndb.datasets AS  ds ON ds.collectionunitid = cu.collectionunitid 
INNER JOIN    ndb.datasettypes AS dst ON   dst.datasettypeid = ds.datasettypeid 
INNER JOIN         ndb.samples AS smp ON       smp.datasetid = ds.datasetid
INNER JOIN            ndb.data AS  dt ON         dt.sampleid = smp.sampleid 
INNER JOIN       ndb.variables AS var ON      var.variableid = dt.variableid
INNER JOIN            ndb.taxa AS  tx ON         var.taxonid = tx.taxonid
WHERE st.sitename LIKE '%Lake%' AND 
      dst.datasettype = 'plant macrofossil' 
      AND tx.taxagroupid = 'VPL'
GROUP BY st.siteid, st.sitename, ds.datasetid
LIMIT 5;

Table 10.2: 5 records
sitename	siteid	datasetid	array_agg
Hells Kitchen Lake	196	49194	{“Abies balsamea”,“Acer saccharum”,Alnus,Arctostaphylos,Betula,“Betula alleghaniensis”,“Betula papyrifera”,“Brasenia schreberi”,Carex,Coniferae,Cyperaceae,“Dulichium arundinaceum”,“Larix laricina”,“Najas flexilis”,“Nemopanthus mucronatus”,Picea,“Pinus resinosa”,“Pinus strobus”,“Populus tremuloides”,Potamogeton,“Quercus rubra”,Sagittaria,Salix,Thuja,“Tsuga canadensis”,“Ulmus americana”}
Wentworth Lake	459	49066	{“Abies lasiocarpa”,“Alnus rubra”,“Alnus viridis subsp. sinuata”,Coniferae,“Coniferae undiff.”,“Picea sitchensis”,“Thuja plicata”,“Tsuga heterophylla”}
Conroy Lake	491	14767	{Alnus}
Crawford Lake	507	16123	{Larix}
Kenosee Lake	1415	17850	{Amaranthaceae,Scirpus}

10.12 `datasetsubmissions`

Submissions to the database are of Datasets. Submissions may be original submissions, resubmissions, compilations from other databases, or recompilations. See the description of the datasetsubmissiontypes table.

submissionid (primary key): An arbitrary submission identification number.
datasetid (foreign key): Dataset identification number. Field links to the datasets table. Datasets may occur multiple times in this table (e.g. once for the original compilation into a different database and a second time for the recompilation into Neotoma).
projectid (foreign key): Database project responsible for the submission or compilation.
contactid (foreign key): Contact identification number. Field links to the contacts table. The Contact is the person who submitted, resubmitted, compiled, or recompiled the data. This person is not necessarily the Dataset PI; it is the person who submitted the data or compiled the data from the literature.
submissiondate: Date of the submission, resubmission, compilation, or recompilation.
submissiontypeid (foreign key): Submission Type identification number. Field links to the datasetsubmissiontypes table.
notes: Free form notes or comments about the submission.

10.13 `datasetsubmissiontypes`

Lookup table of Dataset Submission Types. Table is referenced by the datasetsubmissions table.

submissiontypeid (primary key): An arbitrary Submission Type identification number.
submissiontype: Type of submission. The database has the following types:
- Original submission from data contributor
- Resubmission or revision from data contributor
- Compilation into a flat file database
- Compilation into a another relational database
- Recompilation or revisions to a another relational database
- Compilation into Neotoma from another database
- Recompilation into Neotoma from another database
- Compilation into Neotoma from primary source
- Recompilation into or revisions to Neotoma: The initial development of Neotoma involved merging the data from several existing databases, including FAUNMAP, the Global Pollen Database, and the North American Plant Macrofossil Database. Thus original compilation of Datasets was into one of these databases, which were then recompiled into Neotoma. The original compilation and the recompilation into Neotoma are separate submissions.

10.13.1 SQL Example

This query gives a list of Dataset Submissions for the site «Bear River No. 3» ordered by date.

SELECT DatasetTypes.DatasetType, Projects.ProjectName,
DatasetSubmissions.SubmissionDate,
DatasetSubmissionTypes.SubmissionType, DatasetSubmissions.Notes

FROM Sites INNER JOIN (Projects INNER JOIN (DatasetTypes INNER JOIN
(DatasetSubmissionTypes INNER JOIN ((CollectionUnits INNER JOIN Datasets
ON CollectionUnits.CollectionUnitID = Datasets.CollectionUnitID) INNER
JOIN DatasetSubmissions ON Datasets.DatasetID =
DatasetSubmissions.DatasetID) ON DatasetSubmissionTypes.SubmissionTypeID
= DatasetSubmissions.SubmissionTypeID) ON DatasetTypes.DatasetTypeID =
Datasets.DatasetTypeID) ON Projects.ProjectID =
DatasetSubmissions.ProjectID) ON Sites.SiteID = CollectionUnits.SiteID

WHERE (((Sites.SiteName)="Bear River No. 3"))
ORDER BY DatasetSubmissions.SubmissionDate;

10.14 `datasettaxagrouptypes`

Taxon groups associated with a dataset.

10.15 `datasettaxonnotes`

Notes associated with taxa within a particular dataset.

10.16 `datasettypes`

Lookup table for Dataset Types. Table is referenced by the datasets table.

datasettypeid (primary key): An arbitrary Dataset Type identification number.
datasettype: The Dataset type, including the following:
- geochronologic
- loss-on-ignition
- pollen
- plant macrofossils
- vertebrate fauna
- mollusks

10.17 `datasetvariables`

The variables associated with a particular dataset.

10.18 `depenvttypes`

Lookup table of Depostional Environment Types. Table is referenced by the collectionunits table.

depenvtid (primary key): An arbitrary Depositional Environment Type identification number.
depenvt: Depositional Environment.
depenvthigherid: The Depositional Environment Types are hierarchical. DepEnvtHigherID is the DepEnvtID of the higher ranked Depositional Environment. The linked example provides some context:

10.18.1 SQL Example

This query gives a list of the top level depostional environment types.

SELECT dt.depenvtid, dt.depenvt
FROM ndb.depenvttypes AS dt
WHERE dt.depenvtid = dt.depenvthigherid;

Table 10.3: Displaying records 1 - 10
depenvtid	depenvt
1	Archaeological
6	Biological
16	Estuarine
19	Lacustrine
51	Marine
59	Palustrine
76	Riverine
93	Sampler
99	Spring
103	Terrestrial

10.18.2 SQL Example

This query uses a recursive CTE to provide a list of all the types of Sampler in Neotoma. The array in deagg represents the hierarchy of the sampler. So, for example, we see that the Gosling Trap is considered a form of Pollen Trap (Terrestrial), which is a form of Pollen Trap.

WITH RECURSIVE dephierarchy AS (
  SELECT de.depenvtid,
         de.depenvt,
         ARRAY[de.depenvtid] AS deagg
  FROM ndb.depenvttypes AS de
  WHERE de.depenvt = 'Sampler'
  UNION ALL 
  SELECT de.depenvtid,
         de.depenvt,
         dephierarchy.deagg || de.depenvtid
  FROM ndb.depenvttypes AS de 
  INNER JOIN dephierarchy ON de.depenvthigherid = dephierarchy.depenvtid
  WHERE NOT de.depenvtid = dephierarchy.depenvtid
)
SELECT * FROM dephierarchy;

Table 10.4: 9 records
depenvtid	depenvt	deagg
93	Sampler	{93}
94	Air Pollen Sampler	{93,94}
95	Pollen Trap	{93,95}
144	Cattle Tank	{93,144}
96	Pollen Trap (Lacustrine)	{93,95,96}
97	Pollen Trap (Terrestrial)	{93,95,97}
98	Tauber Trap	{93,95,97,98}
219	Gosling Trap	{93,95,97,219}
221	Bush Trap	{93,95,97,221}

10.18.3 SQL Example

This query uses a recursive CTE to list every depositional environment and the top level environment to which it relates:

WITH RECURSIVE dephierarchy AS 
(SELECT de.depenvtid,
        de.depenvt,
        ARRAY[de.depenvt::varchar] AS deagg 
 FROM ndb.depenvttypes AS de 
 UNION ALL 
 SELECT de.depenvtid, 
        de.depenvt, 
        dephierarchy.deagg || de.depenvt 
 FROM ndb.depenvttypes AS de  
 INNER JOIN dephierarchy ON de.depenvthigherid = dephierarchy.depenvtid
 WHERE NOT de.depenvtid = dephierarchy.depenvtid)
 SELECT depenvtid, 
        depenvt,
        deagg[1] FROM dephierarchy 
 WHERE dephierarchy.deagg[1] = ANY(SELECT depenvt FROM ndb.depenvttypes AS det WHERE det.depenvthigherid = det.depenvtid);

Table 10.5: Displaying records 1 - 10
depenvtid	depenvt	deagg
1	Archaeological	Archaeological
6	Biological	Biological
16	Estuarine	Estuarine
19	Lacustrine	Lacustrine
51	Marine	Marine
59	Palustrine	Palustrine
76	Riverine	Riverine
93	Sampler	Sampler
99	Spring	Spring
103	Terrestrial	Terrestrial

10 Dataset & Collection Related Tables

10.1 accumulationrates

10.2 aggregatedatasets

10.3 aggregateordertypes

10.4 collectiontypes

10.5 collectionunits

10.6 contextsdatasettypes

10.7 datasetdatabases

10.8 datasetdoi

10.9 datasetpis

10.10 datasetpublications

10.11 datasets