11 Dataset & Collection Related Tables
11.1 accumulationrates
This table represents estimated accumulation rates based on particular chronologies associated with collectionunits. Units are defined within the table. Accumulation rates play an important role in understanding surficial and depositional processes (Bennett and Buck 2016), and can be critical for defining relevant priors for Bayesian chronologies (S. Goring et al. 2012). While accumulation rates can be calculated on-the-fly, Neotoma also stores accumulation rates when reported, however, at present only 11 chronologies have reported accumulation rates.
analysisunitid
(foreign key): The identifier for the associated analysis unit.chronologyid
(foreign key)accumulationrate
: The rate at which sediment accumulations within a particular analysis unit, defined (generally) as the length of time required to accumulate some depth of sediment (e.g., yr/cm).variableunitsid
(foreign key): Foreign key referencing variable units, describing the units in which the accumulation rate is reported.
11.2 aggregatedatasets
Aggregate Datasets are aggregates of samples of a particular datasettypes
.
Some examples:
- Plant macrofossil samples from a group of packrat middens collected from a particular valley, mountain range, or other similarly defined geographic area. Each midden is from a different
site
orcollectionunit
, but they are grouped into time series for that area and are published as single dataset. - Samples collected from 32 cutbanks along several kms of road. Each sample is from a different site, but they form a time series from 0 – 12,510 14C yr BP, and pollen, plant macrofossils, and beetles were published and graphed as if from a single site.
- A set of pollen surface samples from particular region or study that were published and analyzed as a single dataset and submitted to the database as a single dataset.
The examples above are datasets predefined in the database. New aggregate datasets could be assembled for particular studies, for example all the pollen samples for a given time slice for a given geographic region.
aggregatedatasetid
(primary key): An arbitrary Aggregate Dataset identification number.aggregatedatasetname
: Name of Aggregate Dataset.aggregateordertypeid
(foreign key): Aggregate Order Type identification number. Field links to theaggregateordertypes
lookup table.notes
: Free form notes about the Aggregate Order Type.
11.3 aggregateordertypes
Lookup table for Aggregate Order Types. Table is referenced by the
aggregatedatasets
table.
aggregateordertypeid
(primary key): An arbitrary Aggregate Order Type identification number.aggregateordertype
: The Aggregate Order Type.notes
: Free form notes or comments about the Aggregate Order Type. The Aggregate Order Types are:- Latitude: AggregateDataset samples are ordered by, in order of priority, either (1)
collectionunits
GPSLatitude or (2) the mean ofsites
.LatitudeNorth andsites
LatitudeSouth. - Longitude AggregateDataset samples are ordered by, in order of priority, either (1)
collectionunits
GPSLongitude or (2) the mean ofsites
.LongitudeWest andsites
.LongitudeEast. - Altitude AggregateDataset samples are ordered by
sites
Altitude. - Age AggregateDataset samples are ordered by
sampleages
.Age, wheresampleages
.SampleAgeID is fromaggregatesampleages
.SampleAgeID. - Alphabetical by site name AggregateDataset samples are ordered alphabetically by
sites
.SiteName. - Alphabetical by collection unit name AggregateDataset samples are ordered alphabetically by
collectionunits
.CollUnitName. - Alphabetical by collection units handle AggregateDataset samples are ordered alphabetically by
collectionunits
.Handle.
- Latitude: AggregateDataset samples are ordered by, in order of priority, either (1)
11.4 collectiontypes
This table is a lookup table of for types of Collection Units, or Collection Types. Table is referenced by the collectionunits
table.
colltypeid
(primary key): An arbitrary Collection Type identification number.colltype
: The Collection Type. Types include cores, sections, excavations, and animal middens. Collection Units may be modern collections, surface float, or isolated specimens. Composite Collections Units include different kinds of Analysis Units, for example a modern surface sample for ostracodes and an associated water sample.
11.5 collectionunits
This table stores data for Collection Units.
collectionunitid
(primary key): An arbitrary Collection Unit identification number.siteid
(foreign key): Site where CollectionUnit was located. Field links tosites
table.colltypeid
(foreign key): Type of Collection Unit. Field links to thecollectiontypes
table.depenvtid
(foreign key): Depositional environment of the CollectionUnit. Normally, this key refers to the modern environment. For example, the site may be located on a colluvial slope, in which case the Depositional Environment may be Colluvium or Colluvial Fan. However, an excavation may extend into alluvial sediments, which represent a different depositional environment. These are accounted for by the Facies of the AnalysisUnit. Field links to thedepenvttypes
table.handle
: Code name for the Collection Unit. This code may be up to 10 characters, but an effort is made to keep these to 8 characters or less. Data are frequently distributed by Collection Unit, and the Handle is used for file names.collunitname
: Name of the Collection Unit. Examples: Core BPT82A, Structure 9, P4A Test 57. If faunal data are reported from a site or locality without explicit Collection Units, then data are assigned to a single Collection Unit with the name «Locality».colldate
: Date Collection Unit was collected.colldevice
: Device used for obtain Collection Unit. This field applies primarily to cores, for example «Wright square-rod piston corer (5 cm)».gpslatitude
: Precise latitude of the Collection Unit, typically taken with a GPS, although may be precisely measured from a map.gpslongitude
: Precise longitude of the Collection Unit, typically taken with a GPS, although may be precisely measured from a map.gpsaltitude
: Precise altitude of the Collection Unit, typically taken with a GPS or precisely obtained from a map.gpserror
: Error in the horizontal GPS coordinates, if known.waterdepth
: Depth of water at the Collection Unit location. This field applies mainly to Collection Units from lakes.substrateid
(foreign key): Substrate or rock type on which the Collection Unit lies. Field links to the RockTypes table. This field is especially used for rodent middens.slopeaspect
: For Collection Units on slopes, the horizontal direction to which a slope faces measured in degrees clockwise from north. This field is especially used for rodent middens.slopeangle
: For Collection Units on slopes, the angle of slope from horizontal. Theslopeangle
field is especially used for rodent middens.location
: Short description of the location of the Collection Unit within the site.notes
: Free form notes or comments about the Collection Unit.
11.6 contextsdatasettypes
Reporting for the set of variable contexts associated with each datasettype. This table is used to relate dataset type to variable context to provide users with a custom set of terms when entering and uploading data for a particular datasettype. For example, Neotoma Pollen:broken is not available for geochronologic datasets.
datasettypeid
(foreign key): Link to the datasettype for a particular variable.variablecontextid
(foreign key): Link to the variable context type.
11.9 datasetpis
This table lists the Principle Investigators for Datasets.
datasetid
(primary key, foreign key): Dataset identification number. Field links to Dataset table.contactid
(primary key, foreign key): Contact identification number. Field links tocontacts
table.piorder
: Order in which PIs are listed.
11.10 datasetpublications
This table lists the publications for datasets.
datasetid
(primary key, foreign key): Dataset identification number. Field links todatasets
table.publicationid
(primary key, foreign key): Publication identification number. Field links topublications
table.primarypub
: Is «True» if the publication is the primary publication for the dataset.
11.11 datasets
This table stores the data for Datasets. A Dataset is the set of samples for a particular data type from a Collection Unit. A Collection Unit may have multiple Datasets for different data types, for example one dataset for pollen and another for plant macrofossils. Every Sample is assigned to a Dataset, and every Dataset is assigned to a Collection Unit. Samples from different Collection Units cannot be assigned to the same Dataset (although they may be assigned to aggregatedatasets
).
datasetid
(primary key): An arbitrary Dataset identification number.collectionunitid
(foreign key): Collection Unit identification number. Field links to thecollectionunits
table.datasettypeid
(foreign key): Dataset Type identification number. Field links to thedatasettypes
lookup table.datasetname
: Optional name for the Dataset.notes
: Free form notes or comments about the Dataset.
11.11.1 SQL Example
The following query lists the Dataset Types for the site Vaifanaura’amo’ora. Note the kind of apostrophe used here. Strings with single quotes can be escaped using ''
a double single quote. Here we are using an ASCII apostrophe character.
SELECT dst.datasettype
FROM ndb.sites AS st
INNER JOIN ndb.collectionunits AS cu ON cu.siteid = st.siteid
INNER JOIN ndb.datasets AS ds ON ds.collectionunitid = cu.collectionunitid
INNER JOIN ndb.datasettypes AS dst ON dst.datasettypeid = ds.datasettypeid
WHERE st.sitename = 'Vaifanaura’amo’ora';
datasettype |
---|
geochronologic |
geochemistry |
paleomagnetic |
insect |
diatom |
plant macrofossil |
phytolith |
charcoal |
pollen |
charcoal |
11.11.2 SQL Example
This query lists the plant macrofossil taxa located at sites in Neotoma with a defined plant macrofossil dataset type. We have to apply a second filter to ensure we are only returning taxa that fit within the vascular plant taxonomic group.
SELECT st.sitename,
st.siteid,
ds.datasetid,
array_agg(DISTINCT tx.taxonname)
FROM ndb.sites AS st
INNER JOIN ndb.collectionunits AS cu ON cu.siteid = st.siteid
INNER JOIN ndb.datasets AS ds ON ds.collectionunitid = cu.collectionunitid
INNER JOIN ndb.datasettypes AS dst ON dst.datasettypeid = ds.datasettypeid
INNER JOIN ndb.samples AS smp ON smp.datasetid = ds.datasetid
INNER JOIN ndb.data AS dt ON dt.sampleid = smp.sampleid
INNER JOIN ndb.variables AS var ON var.variableid = dt.variableid
INNER JOIN ndb.taxa AS tx ON var.taxonid = tx.taxonid
WHERE st.sitename LIKE '%Lake%' AND
dst.datasettype = 'plant macrofossil'
AND tx.taxagroupid = 'VPL'
GROUP BY st.siteid, st.sitename, ds.datasetid
LIMIT 5;
sitename | siteid | datasetid | array_agg |
---|---|---|---|
Hells Kitchen Lake | 196 | 49194 | {“Abies balsamea”,“Acer saccharum”,Alnus,Arctostaphylos,Betula,“Betula alleghaniensis”,“Betula papyrifera”,“Brasenia schreberi”,Carex,Coniferae,Cyperaceae,“Dulichium arundinaceum”,“Larix laricina”,“Najas flexilis”,“Nemopanthus mucronatus”,Picea,“Pinus resinosa”,“Pinus strobus”,“Populus tremuloides”,Potamogeton,“Quercus rubra”,Sagittaria,Salix,Thuja,“Tsuga canadensis”,“Ulmus americana”} |
Wentworth Lake | 459 | 49066 | {“Abies lasiocarpa”,“Alnus rubra”,“Alnus viridis subsp. sinuata”,Coniferae,“Coniferae undiff.”,“Picea sitchensis”,“Thuja plicata”,“Tsuga heterophylla”} |
Conroy Lake | 491 | 14767 | {Alnus} |
Crawford Lake | 507 | 16123 | {Larix} |
Kenosee Lake | 1415 | 17850 | {Amaranthaceae,Scirpus} |
11.12 datasetsubmissions
Submissions to the database are of Datasets. Submissions may be original
submissions, resubmissions, compilations from other databases, or
recompilations. See the description of the
datasetsubmissiontypes
table.
submissionid
(primary key): An arbitrary submission identification number.datasetid
(foreign key): Dataset identification number. Field links to thedatasets
table. Datasets may occur multiple times in this table (e.g. once for the original compilation into a different database and a second time for the recompilation into Neotoma).projectid
(foreign key): Database project responsible for the submission or compilation.contactid
(foreign key): Contact identification number. Field links to thecontacts
table. The Contact is the person who submitted, resubmitted, compiled, or recompiled the data. This person is not necessarily the Dataset PI; it is the person who submitted the data or compiled the data from the literature.submissiondate
: Date of the submission, resubmission, compilation, or recompilation.submissiontypeid
(foreign key): Submission Type identification number. Field links to thedatasetsubmissiontypes
table.notes
: Free form notes or comments about the submission.
11.13 datasetsubmissiontypes
Lookup table of Dataset Submission Types. Table is referenced by the datasetsubmissions
table.
submissiontypeid
(primary key): An arbitrary Submission Type identification number.submissiontype
: Type of submission. The database has the following types:- Original submission from data contributor
- Resubmission or revision from data contributor
- Compilation into a flat file database
- Compilation into a another relational database
- Recompilation or revisions to a another relational database
- Compilation into Neotoma from another database
- Recompilation into Neotoma from another database
- Compilation into Neotoma from primary source
- Recompilation into or revisions to Neotoma: The initial development of Neotoma involved merging the data from several existing databases, including FAUNMAP, the Global Pollen Database, and the North American Plant Macrofossil Database. Thus original compilation of Datasets was into one of these databases, which were then recompiled into Neotoma. The original compilation and the recompilation into Neotoma are separate submissions.
11.13.1 SQL Example
This query gives a list of Dataset Submissions for the site «Bear River No. 3» ordered by date.
SELECT DatasetTypes.DatasetType, Projects.ProjectName,
DatasetSubmissions.SubmissionDate,
DatasetSubmissionTypes.SubmissionType, DatasetSubmissions.Notes
FROM Sites INNER JOIN (Projects INNER JOIN (DatasetTypes INNER JOIN
(DatasetSubmissionTypes INNER JOIN ((CollectionUnits INNER JOIN Datasets
ON CollectionUnits.CollectionUnitID = Datasets.CollectionUnitID) INNER
JOIN DatasetSubmissions ON Datasets.DatasetID =
DatasetSubmissions.DatasetID) ON DatasetSubmissionTypes.SubmissionTypeID
= DatasetSubmissions.SubmissionTypeID) ON DatasetTypes.DatasetTypeID =
Datasets.DatasetTypeID) ON Projects.ProjectID =
DatasetSubmissions.ProjectID) ON Sites.SiteID = CollectionUnits.SiteID
WHERE (((Sites.SiteName)="Bear River No. 3"))
ORDER BY DatasetSubmissions.SubmissionDate;
11.16 datasettypes
Lookup table for Dataset Types. Table is referenced by the datasets
table.
datasettypeid
(primary key): An arbitrary Dataset Type identification number.datasettype
: The Dataset type, including the following:- geochronologic
- loss-on-ignition
- pollen
- plant macrofossils
- vertebrate fauna
- mollusks
11.18 depenvttypes
Lookup table of Depostional Environment Types. Table is referenced by
the collectionunits
table.
depenvtid
(primary key): An arbitrary Depositional Environment Type identification number.depenvt
: Depositional Environment.depenvthigherid
: The Depositional Environment Types are hierarchical. DepEnvtHigherID is the DepEnvtID of the higher ranked Depositional Environment. The linked example provides some context:
11.18.1 SQL Example
This query gives a list of the top level depostional environment types.
SELECT dt.depenvtid, dt.depenvt
FROM ndb.depenvttypes AS dt
WHERE dt.depenvtid = dt.depenvthigherid;
depenvtid | depenvt |
---|---|
1 | Archaeological |
6 | Biological |
16 | Estuarine |
19 | Lacustrine |
51 | Marine |
59 | Palustrine |
76 | Riverine |
93 | Sampler |
99 | Spring |
103 | Terrestrial |
11.18.2 SQL Example
This query uses a recursive CTE to provide a list of all the types of Sampler in Neotoma. The array in deagg
represents the hierarchy of the sampler. So, for example, we see that the Gosling Trap is considered a form of Pollen Trap (Terrestrial), which is a form of Pollen Trap.
WITH RECURSIVE dephierarchy AS (
SELECT de.depenvtid,
de.depenvt,
ARRAY[de.depenvtid] AS deagg
FROM ndb.depenvttypes AS de
WHERE de.depenvt = 'Sampler'
UNION ALL
SELECT de.depenvtid,
de.depenvt,
dephierarchy.deagg || de.depenvtid
FROM ndb.depenvttypes AS de
INNER JOIN dephierarchy ON de.depenvthigherid = dephierarchy.depenvtid
WHERE NOT de.depenvtid = dephierarchy.depenvtid
)
SELECT * FROM dephierarchy;
depenvtid | depenvt | deagg |
---|---|---|
93 | Sampler | {93} |
94 | Air Pollen Sampler | {93,94} |
95 | Pollen Trap | {93,95} |
144 | Cattle Tank | {93,144} |
96 | Pollen Trap (Lacustrine) | {93,95,96} |
97 | Pollen Trap (Terrestrial) | {93,95,97} |
98 | Tauber Trap | {93,95,97,98} |
219 | Gosling Trap | {93,95,97,219} |
221 | Bush Trap | {93,95,97,221} |
11.18.3 SQL Example
This query uses a recursive CTE to list every depositional environment and the top level environment to which it relates:
WITH RECURSIVE dephierarchy AS
(SELECT de.depenvtid,
de.depenvt,
ARRAY[de.depenvt::varchar] AS deagg
FROM ndb.depenvttypes AS de
UNION ALL
SELECT de.depenvtid,
de.depenvt,
dephierarchy.deagg || de.depenvt
FROM ndb.depenvttypes AS de
INNER JOIN dephierarchy ON de.depenvthigherid = dephierarchy.depenvtid
WHERE NOT de.depenvtid = dephierarchy.depenvtid)
SELECT depenvtid,
depenvt,
deagg[1] FROM dephierarchy
WHERE dephierarchy.deagg[1] = ANY(SELECT depenvt FROM ndb.depenvttypes AS det WHERE det.depenvthigherid = det.depenvtid);
depenvtid | depenvt | deagg |
---|---|---|
1 | Archaeological | Archaeological |
6 | Biological | Biological |
16 | Estuarine | Estuarine |
19 | Lacustrine | Lacustrine |
51 | Marine | Marine |
59 | Palustrine | Palustrine |
76 | Riverine | Riverine |
93 | Sampler | Sampler |
99 | Spring | Spring |
103 | Terrestrial | Terrestrial |