• 1 Acknowledgements
  • 2 Introduction
    • 2.1 Whence Neotoma
    • 2.2 Rationale
    • 2.3 History of the Constituent Databases
      • 2.3.1 Global Pollen Database
      • 2.3.2 North American Plant Macrofossil Database
      • 2.3.3 FAUNMAP
      • 2.3.4 BEETLE
    • 2.4 Who Will Use Neotoma?
  • 3 Working with the Raw Data
    • 3.1 Using the Database Locally
  • 4 Using R
  • 5 Searching for Sites
    • 5.1 Site names: sitename="%Lago%"
      • 5.1.1 Code
  • 6 SQL Quickly
    • 6.1 SQL Example
      • 6.1.1 SQL Query
    • 6.2 Table Keys
    • 6.3 Data Types
      • 6.3.1 Query
  • 7 Database Design Concepts
    • 7.1 Sites, Collection Units, Analysis Units, Samples, and Datasets
    • 7.2 Taxa and Variables
    • 7.3 Taxonomy and Synonymy
    • 7.4 Taxa and Ecological Groups
    • 7.5 Chronology
    • 7.6 Sediment and Depositional Environments
    • 7.7 Date Fields
  • 8 Neotoma Tables
    • 8.1 Site Related Tables
    • 8.2 Dataset & Collection Related Tables
    • 8.3 Chronology & Age Related Tables
    • 8.4 Sample Related Tables
    • 8.5 Specimen Related Tables
    • 8.6 Taxonomy Related Tables
    • 8.7 Individual Related Tables
    • 8.8 Publication Related Tables
    • 8.9 Supporting Resources
    • 8.10 Isotope Data Management
  • 9 Contact and Individual Related Tables
    • 9.1 collectors
      • 9.1.1 SQL Example
    • 9.2 contacts
    • 9.3 contactstatuses
  • 10 Dataset & Collection Related Tables
    • 10.1 accumulationrates
    • 10.2 aggregatedatasets
    • 10.3 aggregateordertypes
    • 10.4 collectiontypes
    • 10.5 collectionunits
    • 10.6 contextsdatasettypes
    • 10.7 datasetdatabases
    • 10.8 datasetdoi
    • 10.9 datasetpis
    • 10.10 datasetpublications
    • 10.11 datasets
      • 10.11.1 SQL Example
      • 10.11.2 SQL Example
    • 10.12 datasetsubmissions
    • 10.13 datasetsubmissiontypes
      • 10.13.1 SQL Example
    • 10.14 datasettaxagrouptypes
    • 10.15 datasettaxonnotes
    • 10.16 datasettypes
    • 10.17 datasetvariables
    • 10.18 depenvttypes
      • 10.18.1 SQL Example
      • 10.18.2 SQL Example
      • 10.18.3 SQL Example
  • 11 Publication Related Tables
    • 11.1 publicationauthors
      • 11.1.1 SQL Example
    • 11.2 publicationeditors
    • 11.3 publications
    • 11.4 publicationtypes
      • 11.4.1 Legacy
      • 11.4.2 Journal Article
      • 11.4.3 Book Chapter
      • 11.4.4 Authored Book
      • 11.4.5 Edited Book
      • 11.4.6 Master’s Thesis
      • 11.4.7 Doctoral Dissertation
      • 11.4.8 Authored Report
      • 11.4.9 Edited Report
      • 11.4.10 Other Authored Publication
      • 11.4.11 Other Edited Publication
  • 12 Sample Related Tables
    • 12.1 aggregatesamples
    • 12.2 analysisunits
    • 12.3 data
      • 12.3.1 SQL Example
    • 12.4 depagents
    • 12.5 depagenttypes
    • 12.6 faciestypes
    • 12.7 keywords
    • 12.8 lithology
    • 12.9 sampleages
      • 12.9.1 SQL Example
    • 12.10 sampleanalysts
    • 12.11 samplekeywords
      • 12.11.1 SQL Example
    • 12.12 samples
      • 12.12.1 SQL Example
  • 13 Site Related Tables
    • 13.1 geopoliticalunits
      • 13.1.1 SQL Example
      • 13.1.2 SQL Example
    • 13.2 lakeparameters
      • 13.2.1 SQL Example
    • 13.3 lakeparametertypes
      • 13.3.1 SQL Example
    • 13.4 sitegeopolitical
      • 13.4.1 SQL Example
    • 13.5 siteimages
    • 13.6 sites
      • 13.6.1 SQL Example
  • 14 Taxonomy Related Tables
    • 14.1 ecolgroups
      • 14.1.1 SQL Example
      • 14.1.2 SQL Example
    • 14.2 ecolgrouptypes
    • 14.3 ecolsettypes
    • 14.4 synonyms
    • 14.5 synonymtypes
      • 14.5.1 SQL Example
    • 14.6 taxa
    • 14.7 taxagrouptypes
    • 14.8 variables
      • 14.8.1 SQL Example
      • 14.8.2 SQL Example
      • 14.8.3 SQL Example
    • 14.9 variablecontexts
    • 14.10 variableelements
    • 14.11 variablemodifications
    • 14.12 variableunits
    • 14.13 repositoryinstitutions
    • 14.14 repositoryspecimens
      • 14.14.1 SQL Example
    • 14.15 specimendates
  • 15 Chronology & Age Related Tables
    • 15.1 agetypes
    • 15.2 aggregatechronologies
    • 15.3 chroncontrols
    • 15.4 chroncontroltypes
    • 15.5 chronologies
      • 15.5.1 SQL Example
      • 15.5.2 SQL Example
    • 15.6 aggregatesampleages
      • 15.6.1 SQL Example
      • 15.6.2 SQL Example
    • 15.7 geochronology
      • 15.7.1 SQL Example
    • 15.8 geochronpublications
    • 15.9 geochrontypes
    • 15.10 relativeagepublications
    • 15.11 relativeages
      • 15.11.1 SQL Example
    • 15.12 radiocarboncalibration
    • 15.13 relativeagescales
    • 15.14 relativeageunits
    • 15.15 relativechronology
    • 15.16 tephrachronology
    • 15.17 tephras
  • 16 References

Neotoma Paleoecology Manual v2.0

10 Dataset & Collection Related Tables

10.1 accumulationrates

This table represents estimated accumulation rates based on particular chronologies associated with collectionunits. Units are defined within the table. Accumulation rates play an important role in understanding surficial and depositional processes (Bennett2016?), and can be critical for defining relevant priors for Bayesian chronologies (Goring2012?). While accumulation rates can be calculated on-the-fly, Neotoma also stores accumulation rates when reported, however, at present only 11 chronologies have reported accumulation rates.

  • analysisunitid (foreign key): The identifier for the associated analysis unit.
  • chronologyid (foreign key)
  • accumulationrate: The rate at which sediment accumulations within a particular analysis unit, defined (generally) as the length of time required to accumulate some depth of sediment (e.g., yr/cm).
  • variableunitsid (foreign key): Foreign key referencing variable units, describing the units in which the accumulation rate is reported.

10.2 aggregatedatasets

Aggregate Datasets are aggregates of samples of a particular datasettypes.

Some examples:

  • Plant macrofossil samples from a group of packrat middens collected from a particular valley, mountain range, or other similarly defined geographic area. Each midden is from a different site or collectionunit, but they are grouped into time series for that area and are published as single dataset.
  • Samples collected from 32 cutbanks along several kms of road. Each sample is from a different site, but they form a time series from 0 – 12,510 14C yr BP, and pollen, plant macrofossils, and beetles were published and graphed as if from a single site.
  • A set of pollen surface samples from particular region or study that were published and analyzed as a single dataset and submitted to the database as a single dataset.

The examples above are datasets predefined in the database. New aggregate datasets could be assembled for particular studies, for example all the pollen samples for a given time slice for a given geographic region.

  • aggregatedatasetid (primary key): An arbitrary Aggregate Dataset identification number.
  • aggregatedatasetname: Name of Aggregate Dataset.
  • aggregateordertypeid (foreign key): Aggregate Order Type identification number. Field links to the aggregateordertypes lookup table.
  • notes: Free form notes about the Aggregate Order Type.

10.3 aggregateordertypes

Lookup table for Aggregate Order Types. Table is referenced by the aggregatedatasets table.

  • aggregateordertypeid (primary key): An arbitrary Aggregate Order Type identification number.
  • aggregateordertype: The Aggregate Order Type.
  • notes: Free form notes or comments about the Aggregate Order Type. The Aggregate Order Types are:
    • Latitude: AggregateDataset samples are ordered by, in order of priority, either (1) collectionunits GPSLatitude or (2) the mean of sites.LatitudeNorth and sites LatitudeSouth.
    • Longitude AggregateDataset samples are ordered by, in order of priority, either (1) collectionunits GPSLongitude or (2) the mean of sites.LongitudeWest and sites.LongitudeEast.
    • Altitude AggregateDataset samples are ordered by sites Altitude.
    • Age AggregateDataset samples are ordered by sampleages.Age, where sampleages.SampleAgeID is from aggregatesampleages.SampleAgeID.
    • Alphabetical by site name AggregateDataset samples are ordered alphabetically by sites.SiteName.
    • Alphabetical by collection unit name AggregateDataset samples are ordered alphabetically by collectionunits.CollUnitName.
    • Alphabetical by collection units handle AggregateDataset samples are ordered alphabetically by collectionunits.Handle.

10.4 collectiontypes

This table is a lookup table of for types of Collection Units, or Collection Types. Table is referenced by the collectionunits table.

  • colltypeid (primary key): An arbitrary Collection Type identification number.

  • colltype: The Collection Type. Types include cores, sections, excavations, and animal middens. Collection Units may be modern collections, surface float, or isolated specimens. Composite Collections Units include different kinds of Analysis Units, for example a modern surface sample for ostracodes and an associated water sample.

10.5 collectionunits

This table stores data for Collection Units.

  • collectionunitid (primary key): An arbitrary Collection Unit identification number.
  • siteid (foreign key): Site where CollectionUnit was located. Field links to sites table.
  • colltypeid (foreign key): Type of Collection Unit. Field links to the collectiontypes table.
  • depenvtid (foreign key): Depositional environment of the CollectionUnit. Normally, this key refers to the modern environment. For example, the site may be located on a colluvial slope, in which case the Depositional Environment may be Colluvium or Colluvial Fan. However, an excavation may extend into alluvial sediments, which represent a different depositional environment. These are accounted for by the Facies of the AnalysisUnit. Field links to the depenvttypes table.
  • handle: Code name for the Collection Unit. This code may be up to 10 characters, but an effort is made to keep these to 8 characters or less. Data are frequently distributed by Collection Unit, and the Handle is used for file names.
  • collunitname: Name of the Collection Unit. Examples: Core BPT82A, Structure 9, P4A Test 57. If faunal data are reported from a site or locality without explicit Collection Units, then data are assigned to a single Collection Unit with the name «Locality».
  • colldate: Date Collection Unit was collected.
  • colldevice: Device used for obtain Collection Unit. This field applies primarily to cores, for example «Wright square-rod piston corer (5 cm)».
  • gpslatitude: Precise latitude of the Collection Unit, typically taken with a GPS, although may be precisely measured from a map.
  • gpslongitude: Precise longitude of the Collection Unit, typically taken with a GPS, although may be precisely measured from a map.
  • gpsaltitude: Precise altitude of the Collection Unit, typically taken with a GPS or precisely obtained from a map.
  • gpserror: Error in the horizontal GPS coordinates, if known.
  • waterdepth: Depth of water at the Collection Unit location. This field applies mainly to Collection Units from lakes.
  • substrateid (foreign key): Substrate or rock type on which the Collection Unit lies. Field links to the RockTypes table. This field is especially used for rodent middens.
  • slopeaspect: For Collection Units on slopes, the horizontal direction to which a slope faces measured in degrees clockwise from north. This field is especially used for rodent middens.
  • slopeangle: For Collection Units on slopes, the angle of slope from horizontal. The slopeangle field is especially used for rodent middens.
  • location: Short description of the location of the Collection Unit within the site.
  • notes: Free form notes or comments about the Collection Unit.

10.6 contextsdatasettypes

Reporting for the set of variable contexts associated with each datasettype. This table is used to relate dataset type to variable context to provide users with a custom set of terms when entering and uploading data for a particular datasettype. For example, Neotoma Pollen:broken is not available for geochronologic datasets.

  • datasettypeid (foreign key): Link to the datasettype for a particular variable.
  • variablecontextid (foreign key): Link to the variable context type.

10.7 datasetdatabases

The constituent database to which a dataset belongs.

10.8 datasetdoi

The doi for a dataset.

10.9 datasetpis

This table lists the Principle Investigators for Datasets.

  • datasetid (primary key, foreign key): Dataset identification number. Field links to Dataset table.
  • contactid (primary key, foreign key): Contact identification number. Field links to contacts table.
  • piorder: Order in which PIs are listed.

10.10 datasetpublications

This table lists the publications for datasets.

  • datasetid (primary key, foreign key): Dataset identification number. Field links to datasets table.
  • publicationid (primary key, foreign key): Publication identification number. Field links to publications table.
  • primarypub: Is «True» if the publication is the primary publication for the dataset.

10.11 datasets

This table stores the data for Datasets. A Dataset is the set of samples for a particular data type from a Collection Unit. A Collection Unit may have multiple Datasets for different data types, for example one dataset for pollen and another for plant macrofossils. Every Sample is assigned to a Dataset, and every Dataset is assigned to a Collection Unit. Samples from different Collection Units cannot be assigned to the same Dataset (although they may be assigned to aggregatedatasets).

  • datasetid (primary key): An arbitrary Dataset identification number.
  • collectionunitid (foreign key): Collection Unit identification number. Field links to the collectionunits table.
  • datasettypeid (foreign key): Dataset Type identification number. Field links to the datasettypes lookup table.
  • datasetname: Optional name for the Dataset.
  • notes: Free form notes or comments about the Dataset.

10.11.1 SQL Example

The following query lists the Dataset Types for the site Vaifanaura’amo’ora. Note the kind of apostrophe used here. Strings with single quotes can be escaped using '' a double single quote. Here we are using an ASCII apostrophe character.

 SELECT dst.datasettype
 FROM ndb.sites AS st 
 INNER JOIN ndb.collectionunits AS cu ON cu.siteid = st.siteid 
 INNER JOIN ndb.datasets AS ds ON ds.collectionunitid = cu.collectionunitid
 INNER JOIN ndb.datasettypes AS dst ON dst.datasettypeid = ds.datasettypeid
 WHERE st.sitename = 'Vaifanaura’amo’ora';
Table 10.1: Displaying records 1 - 10
datasettype
geochronologic
geochemistry
paleomagnetic
insect
diatom
plant macrofossil
phytolith
charcoal
pollen
charcoal

10.11.2 SQL Example

This query lists the plant macrofossil taxa located at sites in Neotoma with a defined plant macrofossil dataset type. We have to apply a second filter to ensure we are only returning taxa that fit within the vascular plant taxonomic group.

SELECT st.sitename,
       st.siteid,
       ds.datasetid,
       array_agg(DISTINCT tx.taxonname)
FROM ndb.sites AS st 
INNER JOIN ndb.collectionunits AS  cu ON           cu.siteid = st.siteid
INNER JOIN        ndb.datasets AS  ds ON ds.collectionunitid = cu.collectionunitid 
INNER JOIN    ndb.datasettypes AS dst ON   dst.datasettypeid = ds.datasettypeid 
INNER JOIN         ndb.samples AS smp ON       smp.datasetid = ds.datasetid
INNER JOIN            ndb.data AS  dt ON         dt.sampleid = smp.sampleid 
INNER JOIN       ndb.variables AS var ON      var.variableid = dt.variableid
INNER JOIN            ndb.taxa AS  tx ON         var.taxonid = tx.taxonid
WHERE st.sitename LIKE '%Lake%' AND 
      dst.datasettype = 'plant macrofossil' 
      AND tx.taxagroupid = 'VPL'
GROUP BY st.siteid, st.sitename, ds.datasetid
LIMIT 5;
Table 10.2: 5 records
sitename siteid datasetid array_agg
Hells Kitchen Lake 196 49194 {“Abies balsamea”,“Acer saccharum”,Alnus,Arctostaphylos,Betula,“Betula alleghaniensis”,“Betula papyrifera”,“Brasenia schreberi”,Carex,Coniferae,Cyperaceae,“Dulichium arundinaceum”,“Larix laricina”,“Najas flexilis”,“Nemopanthus mucronatus”,Picea,“Pinus resinosa”,“Pinus strobus”,“Populus tremuloides”,Potamogeton,“Quercus rubra”,Sagittaria,Salix,Thuja,“Tsuga canadensis”,“Ulmus americana”}
Wentworth Lake 459 49066 {“Abies lasiocarpa”,“Alnus rubra”,“Alnus viridis subsp. sinuata”,Coniferae,“Coniferae undiff.”,“Picea sitchensis”,“Thuja plicata”,“Tsuga heterophylla”}
Conroy Lake 491 14767 {Alnus}
Crawford Lake 507 16123 {Larix}
Kenosee Lake 1415 17850 {Amaranthaceae,Scirpus}

10.12 datasetsubmissions

Submissions to the database are of Datasets. Submissions may be original submissions, resubmissions, compilations from other databases, or recompilations. See the description of the datasetsubmissiontypes table.

  • submissionid (primary key): An arbitrary submission identification number.
  • datasetid (foreign key): Dataset identification number. Field links to the datasets table. Datasets may occur multiple times in this table (e.g. once for the original compilation into a different database and a second time for the recompilation into Neotoma).
  • projectid (foreign key): Database project responsible for the submission or compilation.
  • contactid (foreign key): Contact identification number. Field links to the contacts table. The Contact is the person who submitted, resubmitted, compiled, or recompiled the data. This person is not necessarily the Dataset PI; it is the person who submitted the data or compiled the data from the literature.
  • submissiondate: Date of the submission, resubmission, compilation, or recompilation.
  • submissiontypeid (foreign key): Submission Type identification number. Field links to the datasetsubmissiontypes table.
  • notes: Free form notes or comments about the submission.

10.13 datasetsubmissiontypes

Lookup table of Dataset Submission Types. Table is referenced by the datasetsubmissions table.

  • submissiontypeid (primary key): An arbitrary Submission Type identification number.
  • submissiontype: Type of submission. The database has the following types:
    • Original submission from data contributor
    • Resubmission or revision from data contributor
    • Compilation into a flat file database
    • Compilation into a another relational database
    • Recompilation or revisions to a another relational database
    • Compilation into Neotoma from another database
    • Recompilation into Neotoma from another database
    • Compilation into Neotoma from primary source
    • Recompilation into or revisions to Neotoma: The initial development of Neotoma involved merging the data from several existing databases, including FAUNMAP, the Global Pollen Database, and the North American Plant Macrofossil Database. Thus original compilation of Datasets was into one of these databases, which were then recompiled into Neotoma. The original compilation and the recompilation into Neotoma are separate submissions.

10.13.1 SQL Example

This query gives a list of Dataset Submissions for the site «Bear River No. 3» ordered by date.

SELECT DatasetTypes.DatasetType, Projects.ProjectName,
DatasetSubmissions.SubmissionDate,
DatasetSubmissionTypes.SubmissionType, DatasetSubmissions.Notes

FROM Sites INNER JOIN (Projects INNER JOIN (DatasetTypes INNER JOIN
(DatasetSubmissionTypes INNER JOIN ((CollectionUnits INNER JOIN Datasets
ON CollectionUnits.CollectionUnitID = Datasets.CollectionUnitID) INNER
JOIN DatasetSubmissions ON Datasets.DatasetID =
DatasetSubmissions.DatasetID) ON DatasetSubmissionTypes.SubmissionTypeID
= DatasetSubmissions.SubmissionTypeID) ON DatasetTypes.DatasetTypeID =
Datasets.DatasetTypeID) ON Projects.ProjectID =
DatasetSubmissions.ProjectID) ON Sites.SiteID = CollectionUnits.SiteID

WHERE (((Sites.SiteName)="Bear River No. 3"))
ORDER BY DatasetSubmissions.SubmissionDate;

10.14 datasettaxagrouptypes

Taxon groups associated with a dataset.

10.15 datasettaxonnotes

Notes associated with taxa within a particular dataset.

10.16 datasettypes

Lookup table for Dataset Types. Table is referenced by the datasets table.

  • datasettypeid (primary key): An arbitrary Dataset Type identification number.
  • datasettype: The Dataset type, including the following:
    • geochronologic
    • loss-on-ignition
    • pollen
    • plant macrofossils
    • vertebrate fauna
    • mollusks

10.17 datasetvariables

The variables associated with a particular dataset.

10.18 depenvttypes

Lookup table of Depostional Environment Types. Table is referenced by the collectionunits table.

  • depenvtid (primary key): An arbitrary Depositional Environment Type identification number.
  • depenvt: Depositional Environment.
  • depenvthigherid: The Depositional Environment Types are hierarchical. DepEnvtHigherID is the DepEnvtID of the higher ranked Depositional Environment. The linked example provides some context:

10.18.1 SQL Example

This query gives a list of the top level depostional environment types.

SELECT dt.depenvtid, dt.depenvt
FROM ndb.depenvttypes AS dt
WHERE dt.depenvtid = dt.depenvthigherid;
Table 10.3: Displaying records 1 - 10
depenvtid depenvt
1 Archaeological
6 Biological
16 Estuarine
19 Lacustrine
51 Marine
59 Palustrine
76 Riverine
93 Sampler
99 Spring
103 Terrestrial

10.18.2 SQL Example

This query uses a recursive CTE to provide a list of all the types of Sampler in Neotoma. The array in deagg represents the hierarchy of the sampler. So, for example, we see that the Gosling Trap is considered a form of Pollen Trap (Terrestrial), which is a form of Pollen Trap.

WITH RECURSIVE dephierarchy AS (
  SELECT de.depenvtid,
         de.depenvt,
         ARRAY[de.depenvtid] AS deagg
  FROM ndb.depenvttypes AS de
  WHERE de.depenvt = 'Sampler'
  UNION ALL 
  SELECT de.depenvtid,
         de.depenvt,
         dephierarchy.deagg || de.depenvtid
  FROM ndb.depenvttypes AS de 
  INNER JOIN dephierarchy ON de.depenvthigherid = dephierarchy.depenvtid
  WHERE NOT de.depenvtid = dephierarchy.depenvtid
)
SELECT * FROM dephierarchy;
Table 10.4: 9 records
depenvtid depenvt deagg
93 Sampler {93}
94 Air Pollen Sampler {93,94}
95 Pollen Trap {93,95}
144 Cattle Tank {93,144}
96 Pollen Trap (Lacustrine) {93,95,96}
97 Pollen Trap (Terrestrial) {93,95,97}
98 Tauber Trap {93,95,97,98}
219 Gosling Trap {93,95,97,219}
221 Bush Trap {93,95,97,221}

10.18.3 SQL Example

This query uses a recursive CTE to list every depositional environment and the top level environment to which it relates:

WITH RECURSIVE dephierarchy AS 
(SELECT de.depenvtid,
        de.depenvt,
        ARRAY[de.depenvt::varchar] AS deagg 
 FROM ndb.depenvttypes AS de 
 UNION ALL 
 SELECT de.depenvtid, 
        de.depenvt, 
        dephierarchy.deagg || de.depenvt 
 FROM ndb.depenvttypes AS de  
 INNER JOIN dephierarchy ON de.depenvthigherid = dephierarchy.depenvtid
 WHERE NOT de.depenvtid = dephierarchy.depenvtid)
 SELECT depenvtid, 
        depenvt,
        deagg[1] FROM dephierarchy 
 WHERE dephierarchy.deagg[1] = ANY(SELECT depenvt FROM ndb.depenvttypes AS det WHERE det.depenvthigherid = det.depenvtid);
Table 10.5: Displaying records 1 - 10
depenvtid depenvt deagg
1 Archaeological Archaeological
6 Biological Biological
16 Estuarine Estuarine
19 Lacustrine Lacustrine
51 Marine Marine
59 Palustrine Palustrine
76 Riverine Riverine
93 Sampler Sampler
99 Spring Spring
103 Terrestrial Terrestrial