Skip to content

Documentation

This part of the project documentation focuses on an information-oriented approach. Use it as a reference for the technical implementation of the DataBUS project code.

Core Data Classes

Core classes representing the fundamental data models used throughout the DataBUS project.

AnalysisUnit

AnalysisUnit(analysisunitid=None, collectionunitid=None, analysisunitname=None, depth=None, thickness=None, faciesid=None, mixed=None, igsn=None, notes=None)

An analysis unit in Neotoma.

Physical subsets of a collection unit, often with a position (depth) within a core or dig site. Samples are the intersection between analysis units and dataset types.

See the Neotoma Manual.

ATTRIBUTE DESCRIPTION
analysisunitid

Analysis unit ID (assigned after insertion).

TYPE: int | None

collectionunitid

Parent collection unit ID.

TYPE: int | None

analysisunitname

Name of the analysis unit.

TYPE: str | None

depth

Depth in the core/site (if known).

TYPE: float | None

thickness

Physical thickness of the sample.

TYPE: float | None

faciesid

Neotoma identifier for rock formation.

TYPE: int | None

mixed

Evidence of stratigraphic mixing.

TYPE: bool | None

igsn

IGSN identifier.

TYPE: str | None

notes

Additional notes.

TYPE: str | None

Examples:

>>> au = AnalysisUnit(collectionunitid=1, depth=2.5)
>>> au.depth
2.5

insert_to_db(cur)

Insert the AnalysisUnit to Neotoma. Args: cur (psycopg2.connect): A valid psycopg2 connection the the Neotoma Database. Returns: AnalysisUnit: The function inserts the AnalysisUnit to Neotoma and adds the new analysisunitid to the object.

ChronControl

ChronControl(chroncontrolid=None, chronologyid=None, chroncontroltypeid=None, depth=None, thickness=None, age=None, agelimityounger=None, agelimitolder=None, notes=None, analysisunitid=None, agetypeid=None)

A chronological control point in Neotoma.

Provides dating constraints for a chronology, such as radiocarbon dates or other age measurements at specific depths within a stratigraphic sequence. See the Neotoma Manual.

ATTRIBUTE DESCRIPTION
chroncontrolid

Control point ID.

TYPE: int | None

chronologyid

Chronology ID.

TYPE: int | None

chroncontroltypeid

Control type ID.

TYPE: int | None

depth

Depth value.

TYPE: float | None

thickness

Thickness value.

TYPE: float | None

age

Age value in years.

TYPE: float | None

agelimityounger

Younger age bound.

TYPE: float | None

agelimitolder

Older age bound.

TYPE: float | None

notes

Additional notes.

TYPE: str | None

analysisunitid

Analysis unit ID.

TYPE: int | None

agetypeid

Age type ID.

TYPE: int | None

Examples:

>>> chron = ChronControl(chronologyid=1, depth=5.5, age=75)
>>> chron.age
75

insert_to_db(cur)

Insert the chronological control point into the database. Args: cur (psycopg2.cursor): Database cursor for executing queries. Returns: int: The chroncontrolid assigned by the database.

Chronology

Chronology(chronologyid=None, collectionunitid=None, agetypeid=None, contactid=None, chronologyname=None, dateprepared=None, agemodel=None, ageboundyounger=None, ageboundolder=None, isdefault=None, notes=None)

A chronology (age model) for a collection unit in Neotoma.

Defines the dating framework for samples within a collection unit, including age model type, bounds, and preparation metadata.

See the Neotoma Manual.

ATTRIBUTE DESCRIPTION
chronologyid

Chronology ID.

TYPE: int | None

collectionunitid

Collection unit ID.

TYPE: int | None

agetypeid

Age type ID.

TYPE: int | None

contactid

Contact ID (first element if list provided).

TYPE: int | None

chronologyname

Chronology name.

TYPE: str | None

dateprepared

Preparation date.

TYPE: str | None

agemodel

Age model description.

TYPE: str | None

ageboundyounger

Younger age bound.

TYPE: float | None

ageboundolder

Older age bound.

TYPE: float | None

isdefault

Whether this is the default chronology for the collection unit.

TYPE: bool | None

notes

Additional notes.

TYPE: str | None

Examples:

>>> chron = Chronology(collectionunitid=1, chronologyname="Model 2023")
>>> chron.chronologyname
'Model 2023'

insert_to_db(cur)

Insert the chronology record into the Neotoma database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION
int

The chronologyid assigned by the database.

CollectionUnit

CollectionUnit(collectionunitid=None, handle=None, siteid=None, colltypeid=None, depenvtid=None, collunitname=None, colldate=None, colldevice=None, gpsaltitude=None, gpserror=None, waterdepth=None, substrateid=None, slopeaspect=None, slopeangle=None, location=None, notes=None, geog=None)

Represents a sediment core or excavation collection in Neotoma.

A collection unit is a physical collection (e.g., a sediment core, excavation) at a specific site. It contains geographic, temporal, and physical information about the collected material.

Collection units are explained further in the Neotoma Manual.

ATTRIBUTE DESCRIPTION
collectionunitid

Collection unit identifier.

TYPE: int | None

handle

Unique handle/identifier.

TYPE: str

siteid

Associated site ID.

TYPE: int

colltypeid

Collection type ID.

TYPE: int | None

depenvtid

Depositional environment ID.

TYPE: int | None

collunitname

Collection unit name.

TYPE: str | None

colldate

Collection date.

TYPE: datetime | None

colldevice

Collection device used.

TYPE: str | None

gpsaltitude

GPS altitude in meters.

TYPE: float | None

gpserror

GPS error in meters.

TYPE: float | None

waterdepth

Water depth in meters.

TYPE: float | None

substrateid

Substrate type ID.

TYPE: int | None

slopeaspect

Slope aspect in degrees.

TYPE: int | None

slopeangle

Slope angle in degrees.

TYPE: int | None

location

Location description.

TYPE: str | None

notes

Additional notes.

TYPE: str | None

geog

Geographic coordinates.

TYPE: Geog | None

distance

Distance from reference (computed when cu.find_close_collunits() is executed).

TYPE: float | None

Examples:

>>> cu = CollectionUnit(siteid=1, handle="MCL-01")  # Mirror Lake core collection
>>> cu.handle
'MCL-01'
>>> cu = CollectionUnit(siteid=2, handle="LC-Core-1", waterdepth=25.5, collunitname="Main core")  # Lake cave site
>>> cu.waterdepth
25.5

find_close_collunits(cur, distance=10000, limit=10)

Find geographically close collection units.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

distance

Distance threshold in meters.

TYPE: float DEFAULT: 10000

limit

Maximum number to return.

TYPE: int DEFAULT: 10

RETURNS DESCRIPTION
list

Collection units within specified distance.

insert_to_db(cur)

Insert the collection unit into the database.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

RETURNS DESCRIPTION
int

The collectionunitid assigned.

Contact

Contact(contactid, contactname=None, order=None)

A person who participated in data collection or processing in Neotoma.

Manages contact information and roles in paleoenvironmental research including data processing, sample analysis, and field collection.

ATTRIBUTE DESCRIPTION
contactid

Contact ID.

TYPE: int

contactname

Contact name.

TYPE: str | None

order

Order/sequence in list of contacts.

TYPE: int | None

RAISES DESCRIPTION
ValueError

If contactid is not int or None, or order is not int or None.

Examples:

>>> contact = Contact(contactid=1, contactname="Simon Goring", order=1)
>>> contact.contactname
'Simon Goring'

insert_pi(cur, datasetid)

Insert contact as principal investigator for a dataset.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

datasetid

Dataset identifier.

TYPE: int

RETURNS DESCRIPTION

None

insert_data_processor(cur, datasetid)

Insert contact as data processor for a dataset.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

datasetid

Dataset identifier.

TYPE: int

RETURNS DESCRIPTION

None

insert_sample_analyst(cur, sampleid)

Insert contact as sample analyst.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

sampleid

Sample identifier.

TYPE: int

RETURNS DESCRIPTION

None

insert_collector(cur, collunitid)

Insert contact as field collector for a collection unit.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

collunitid

Collection unit identifier.

TYPE: int

RETURNS DESCRIPTION

None

Dataset

Dataset(datasettypeid, datasetid=None, collectionunitid=None, datasetname=None, notes=None)

A dataset in Neotoma.

A collection of data (e.g., pollen counts, isotope values) of a specific type associated with a collection unit.

Datasets are explained further in the Neotoma Manual.

ATTRIBUTE DESCRIPTION
datasetid

Dataset ID (assigned after insertion).

TYPE: int | None

collectionunitid

Collection unit ID.

TYPE: int | None

datasettypeid

Dataset type ID (required).

TYPE: int

datasetname

Dataset name.

TYPE: str | None

notes

Additional notes.

TYPE: str | None

RAISES DESCRIPTION
ValueError

If datasettypeid is not an integer.

Examples:

>>> dataset = Dataset(datasettypeid=1, datasetname="Pollen Core")
>>> dataset.datasettypeid
1

insert_to_db(cur)

Insert the dataset record into the database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION
int

The datasetid assigned by the database.

DatasetDatabase

DatasetDatabase(databaseid, datasetid=None)

A link between a dataset and a constituent database in Neotoma.

Associates a dataset with a constituent database, enabling tracking of dataset provenance.

ATTRIBUTE DESCRIPTION
databaseid

Constituent database ID.

TYPE: int

datasetid

Dataset ID.

TYPE: int | None

RAISES DESCRIPTION
ValueError

If databaseid or datasetid is not int or None.

Examples:

>>> ds_db = DatasetDatabase(databaseid=1, datasetid=2)
>>> ds_db.databaseid
1

insert_to_db(cur)

Insert the dataset-database relationship into the database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION

None

DataUncertainty

DataUncertainty(dataid, uncertaintyvalue, uncertaintyunitid, uncertaintybasisid, notes)

Measurement uncertainty for a data value in Neotoma.

Stores uncertainty metrics including magnitude, units, and basis of uncertainty quantification.

ATTRIBUTE DESCRIPTION
dataid

Data ID.

TYPE: int

uncertaintyvalue

Uncertainty magnitude.

TYPE: float | None

uncertaintyunitid

Uncertainty units ID.

TYPE: int | None

uncertaintybasisid

Uncertainty basis ID.

TYPE: int | None

notes

Notes about uncertainty.

TYPE: str | None

Examples:

>>> uncert = DataUncertainty(dataid=1, uncertaintyvalue=5.0,
...                          uncertaintyunitid=2, uncertaintybasisid=1, notes=None)
>>> uncert.uncertaintyvalue
5.0

insert_to_db(cur)

Insert the data uncertainty record into the database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION

None

Datum

Datum(sampleid=None, variableid=None, value=None)

A data point measurement in the Neotoma database.

Encapsulates a single measurement or observation linking a sample to a variable with a measured value.

Data are explained further in the [Neotoma Manual]https://open.neotomadb.org/manual/sample-related-tables-1.html#Data)

ATTRIBUTE DESCRIPTION
sampleid

Sample identifier.

TYPE: int | None

variableid

Variable identifier.

TYPE: int | None

value

The measured value.

TYPE: float | None

datumid

Database ID (assigned after insertion).

TYPE: int | None

Examples:

>>> datum = Datum(sampleid=1, variableid=42, value=125.3)
>>> datum.value
125.3

insert_to_db(cur)

Insert the datum record into the Neotoma database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION
int

The datumid assigned by the database.

Geochron

Geochron(sampleid=None, geochrontypeid=None, agetypeid=None, age=None, errorolder=None, erroryounger=None, infinite=None, delta13c=None, labnumber=None, materialdated=None, notes=None)

A geochronological age determination in Neotoma.

Stores age measurements from radiometric and other dating techniques, including determined age, uncertainty bounds, and dated material info.

Geochronologies are explained further in the Neotoma Manual

ATTRIBUTE DESCRIPTION
sampleid

Sample ID.

TYPE: int

geochrontypeid

Geochron type ID.

TYPE: int

agetypeid

Age type ID.

TYPE: int

age

Age value.

TYPE: float

errorolder

Older error bound.

TYPE: float

erroryounger

Younger error bound.

TYPE: float

infinite

Infinite age flag (defaults to False).

TYPE: bool

delta13c

Delta 13C value (for radiocarbon).

TYPE: float | None

labnumber

Laboratory number.

TYPE: str | None

materialdated

Material dated.

TYPE: str | None

notes

Additional notes.

TYPE: str | None

geochronid

Geochron ID (assigned after insertion).

TYPE: int | None

Examples:

>>> geo = Geochron(sampleid=1, geochrontypeid=1, agetypeid=1, age=3250,
...                errorolder=100, erroryounger=100, infinite=False,
...                delta13c=-25.5, labnumber="UCIAMS-12345",
...                materialdated="Charcoal", notes=None)
>>> geo.age
3250

insert_to_db(cur)

Insert the geochronological date into the database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION
int

The geochronid assigned by the database.

GeochronControl

GeochronControl(chroncontrolid, geochronid)

A link between a chronological control and geochronological date.

Links a chronological control point with a geochronological age determination used to constrain the age model.

Geochronologies are explained further in the Neotoma Manual

ATTRIBUTE DESCRIPTION
chroncontrolid

Chrono control ID.

TYPE: int

geochronid

Geochron ID.

TYPE: int

geochroncontrolid

Geochron control ID (assigned after insertion).

TYPE: int | None

Examples:

>>> gc = GeochronControl(chroncontrolid=1, geochronid=2)
>>> gc.chroncontrolid
1

insert_to_db(cur)

Insert the geochron-control relationship into the database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION
int

The geochroncontrolid assigned by the database.

Geog

WrongCoordinates

Bases: Exception

Custom exception raised when coordinates are outside valid geographic ranges.

Geog(coords)

Geographic coordinates with validation and hemisphere determination.

Stores latitude and longitude with validation to ensure values are within valid geographic ranges. Automatically determines hemisphere from coordinates.

ATTRIBUTE DESCRIPTION
longe

Longitude in decimal degrees (-180 to 180).

TYPE: float | None

latn

Latitude in decimal degrees (-90 to 90).

TYPE: float | None

longw

Longitude in decimal degrees (-180 to 180).

TYPE: float | None

lats

Latitude in decimal degrees (-90 to 90).

TYPE: float | None

hemisphere

Cardinal directions ('NE', 'NW', 'SE', 'SW').

TYPE: str | None

RAISES DESCRIPTION
TypeError

If coords is not list/tuple/None, or lat/long not numbers.

ValueError

If coords length is not 2.

WrongCoordinates

If coordinates outside valid ranges.

Examples:

>>> geog = Geog([43.3734, -71.5316, 43.3734, -71.5316])
>>> geog.hemisphere
'NW'

Hiatus

Hiatus(hiatusid=None, analysisunitstart=None, analysisunitend=None, notes=None)

A hiatus or stratigraphic gap in a sediment sequence.

Marks a discontinuity in the stratigraphic record, representing missing time. Bounded by analysis units and can be associated with a chronology.

ATTRIBUTE DESCRIPTION
hiatusid

Hiatus ID.

TYPE: int | None

analysisunitstart

Start analysis unit ID.

TYPE: int

analysisunitend

End analysis unit ID.

TYPE: int

notes

Description (first element if list provided).

TYPE: str | None

RAISES DESCRIPTION
TypeError

If hiatusid is not int/None/"NA", analysis units not int, or notes not str/None.

Examples:

>>> hiatus = Hiatus(analysisunitstart=10, analysisunitend=15)
>>> hiatus.analysisunitstart
10

insert_to_db(cur)

Insert the hiatus record into the database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION
int

The hiatusid assigned by the database.

insert_hiatus_chron_to_db(chronologyid, hiatuslength, hiatusuncertainty, cur)

Insert hiatus-chronology relationship into the database.

PARAMETER DESCRIPTION
chronologyid

Associated chronology identifier.

TYPE: int

hiatuslength

Length of hiatus.

TYPE: float

hiatusuncertainty

Uncertainty in hiatus length.

TYPE: float

cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION

None

LeadModel

LeadModel(pbbasisid=None, analysisunitid=None, cumulativeinventory=None, datinghorizon=None)

A Lead-210 geochronological model in Neotoma.

Manages lead isotope data for radiometric dating of sediment cores, storing basis information and cumulative inventory values.

ATTRIBUTE DESCRIPTION
pbbasisid

Lead isotope basis ID.

TYPE: int | None

analysisunitid

Analysis unit ID.

TYPE: int | None

cumulativeinventory

Cumulative inventory (Bq/cm²).

TYPE: float | None

datinghorizon

Depth of the dating horizon (cm).

TYPE: float | None

Examples:

>>> lead_model = LeadModel(pbbasisid=1, analysisunitid=2, cumulativeinventory=145.3)
>>> lead_model.cumulativeinventory
145.3

insert_to_db(cur)

Insert the Lead model record into the Neotoma database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION

None

Response

Response()

Base response class for handling validation and messaging.

This class provides a standard structure for returning validation results and messages from database operations. It includes attributes for tracking validity, messages, and associated database IDs.

Examples:

>>> response = Response(valid=[True], message=["Pollen dataset validated successfully"])
>>> response.validAll
True
>>> response = Response(valid=[True, True, False], message=["Data validation error"])
>>> len(response.valid)
3
PARAMETER DESCRIPTION
valid

List of validation boolean values.

TYPE: list | None

message

List of message strings.

TYPE: list | None

ATTRIBUTE DESCRIPTION
valid

Validation status values.

TYPE: list

message

Message strings.

TYPE: list

validAll

Overall validation status.

TYPE: bool | None

id_int

Associated ID.

TYPE: int | None

id_list

Associated IDs.

TYPE: list

id_dict

Mapping of data identifiers.

TYPE: dict

name

Name mapping dictionary.

TYPE: dict

indices

List of indices.

TYPE: list

Initialize a Response object.

validAll property

True if valid is a non-empty list of booleans and all are True. False otherwise.

Sample

Sample(analysisunitid=None, datasetid=None, samplename=None, sampledate=None, analysisdate=None, taxonid=None, labnumber=None, prepmethod=None, notes=None)

A sample in Neotoma.

The intersection between an analysis unit and a dataset, representing physical material analyzed with associated metadata.

See the Neotoma Manual

ATTRIBUTE DESCRIPTION
analysisunitid

Analysis unit ID.

TYPE: int | None

datasetid

Dataset ID.

TYPE: int | None

samplename

Sample name (first element if list provided).

TYPE: str | None

sampledate

Collection date.

TYPE: datetime | None

analysisdate

Analysis date.

TYPE: datetime | None

taxonid

Taxon ID.

TYPE: int | None

labnumber

Laboratory number.

TYPE: str | None

prepmethod

Preparation method.

TYPE: str | None

notes

Additional notes.

TYPE: str | None

sampleid

Sample ID (assigned after insertion).

TYPE: int | None

Examples:

>>> sample = Sample(analysisunitid=1, samplename="Pollen-2cm")
>>> sample.samplename
'Pollen-2cm'

insert_to_db(cur)

Insert the sample record into the database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION
int

The sampleid assigned by the database.

SampleAge

SampleAge(sampleid=None, chronologyid=None, age=None, ageyounger=None, ageolder=None)

Age information for a sample in the Neotoma database.

Stores age estimates for a sample within a specific chronology, including age bounds.

See the Neotoma Manual

ATTRIBUTE DESCRIPTION
sampleid

Sample identifier. Assigned after insertion.

TYPE: int | None

chronologyid

Chronology identifier.

TYPE: int | None

age

Age estimate.

TYPE: float | None

ageyounger

Younger age bound.

TYPE: float | None

ageolder

Older age bound.

TYPE: float | None

Examples:

>>> sample_age = SampleAge(chronologyid=2, age=75)
>>> sample_age.age
75

insert_to_db(cur)

Insert the sample age record into the Neotoma database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION
int

The sampleage ID assigned by the database.

Site

Site(siteid=None, sitename=None, altitude=None, area=None, sitedescription=None, notes=None, geog=None)

Represents a geographic site location in Neotoma.

A site is a geographic location where paleoenvironmental data has been collected. It contains geographic coordinates and descriptive information.

See the Neotoma Manual.

ATTRIBUTE DESCRIPTION
description

Class description.

TYPE: str

siteid

Site identifier.

TYPE: int | None

sitename

Site name (required).

TYPE: str

altitude

Elevation in meters.

TYPE: int | None

area

Site area in square kilometers.

TYPE: float | None

sitedescription

Detailed description.

TYPE: str | None

notes

Additional notes.

TYPE: str | None

geog

Geographic coordinates.

TYPE: Geog | None

distance

Distance from reference (computed).

TYPE: float | None

Examples:

>>> site = Site(sitename="Mirror Lake", geog=Geog([43.3734, -71.5316]))  # Mirror Lake, NH
>>> site.sitename
'Mirror Lake'
>>> site = Site(sitename="Crater Lake", altitude=1949, geog=Geog([42.9453, -122.1103]))
>>> site.altitude
1949

insert_to_db(cur)

Insert the site into the Neotoma database. Args: cur (psycopg2.cursor): Database cursor. Returns: int: The siteid assigned by the database.

upsert_to_db(cur)

Updates a site that already exists in the database. Args: cur (psycopg2.cursor): Database cursor. Returns: int: The siteid.

find_close_sites(cur, dist=10000, limit=5)

Find geographically close sites using PostGIS distance. Args: cur (psycopg2.cursor): Database cursor. dist (float): Distance threshold in meters (default 10km). limit (int): Maximum number of sites to return (default 5). Returns: list: Tuples of site records ordered by distance.

update_site(other, overwrite, siteresponse=None)

Update site attributes from another site object. Args: other (Site): Source site for updating. overwrite (dict): Dictionary specifying which attributes to overwrite. siteresponse (SiteResponse | None): Response object for tracking changes. Returns: Site: Updated site object.

compare_site(other)

Compare site attributes with another site object. Args: other (Site): Site object to compare against. Returns: list: List of differences found between sites.

Speleothem

Speleothem(siteid=None, entityid=None, entityname=None, monitoring=None, rockageid=None, entrancedistance=None, entrancedistanceunits=None, speleothemtypeid=None)

Represents a speleothem (stalactite, stalagmite, flowstone, etc.) in a cave in Neotoma.

This class manages information about cave mineral deposits that may be sampled for paleoenvironmental reconstruction through isotopic and geochemical analysis.

ATTRIBUTE DESCRIPTION
siteid

Site ID.

TYPE: int | None

entityid

Entity ID.

TYPE: int | None

entityname

Name.

TYPE: str | None

monitoring

Monitoring flag.

TYPE: bool | None

rockageid

Rock age ID.

TYPE: int | None

entrancedistance

Distance from entrance.

TYPE: float | None

entrancedistanceunits

Distance units.

TYPE: int | None

speleothemtypeid

Speleothem type ID.

TYPE: int | None

Examples:

>>> spel = Speleothem(siteid=1, entityname="Palace Chandelier", speleothemtypeid=1)  # Stalactite from Lehman Cave
>>> spel.entityname
'Palace Chandelier'
>>> spel = Speleothem(siteid=2, entityname="Main Stalagmite", speleothemtypeid=2, entrancedistance=45.5)
>>> spel.entrancedistance
45.5

insert_to_db(cur)

Insert the speleothem record into the database.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

RETURNS DESCRIPTION
int

The speleothem ID assigned.

insert_entitygeology_to_db(cur, id, speleothemgeologyid, notes)

Insert speleothem geology information into the database.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

id

Entity identifier.

TYPE: int

speleothemgeologyid

Geology type ID.

TYPE: int

notes

Notes.

TYPE: str | None

RETURNS DESCRIPTION

None

insert_entitydripheight_to_db(cur, id, speleothemdriptypeid, entitydripheight, entitydripheightunit)

Insert drip rate information for a speleothem.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

id

Entity identifier.

TYPE: int

speleothemdriptypeid

Drip type ID.

TYPE: int

entitydripheight

Drip height measurement.

TYPE: float

entitydripheightunit

Unit ID for height.

TYPE: int

RETURNS DESCRIPTION

None

insert_entitycovers_to_db(cur, id, entitycoverid, entitycoverthickness, entitycoverunits)

Insert cover information for a speleothem.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

id

Entity identifier.

TYPE: int

entitycoverid

Cover type ID.

TYPE: int

entitycoverthickness

Cover thickness.

TYPE: float

entitycoverunits

Unit ID for thickness.

TYPE: int

RETURNS DESCRIPTION

None

insert_entitylandusecovers_to_db(cur, id, landusecovertypeid, landusecoverpercent, landusecovernotes)

Insert land use cover information for a speleothem.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

id

Entity identifier.

TYPE: int

landusecovertypeid

Land use type ID.

TYPE: int

landusecoverpercent

Percentage coverage.

TYPE: float

landusecovernotes

Notes.

TYPE: str | None

RETURNS DESCRIPTION

None

insert_entityvegetationcovers_to_db(cur, id, vegetationcovertypeid, vegetationcoverpercent, vegetationcovernotes)

Insert vegetation cover information for a speleothem.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

id

Entity identifier.

TYPE: int

vegetationcovertypeid

Vegetation type ID.

TYPE: int

vegetationcoverpercent

Percentage coverage.

TYPE: float

vegetationcovernotes

Notes.

TYPE: str | None

RETURNS DESCRIPTION

None

insert_entitysamples_to_db(cur, id, organics, fluid_inclusions, mineralogy_petrology_fabric, clumped_isotopes, noble_gas_temperatures, C14, ODL)

Insert sample type information for a speleothem.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

id

Entity identifier.

TYPE: int

organics

Organic material present.

TYPE: bool | None

fluid_inclusions

Fluid inclusions present.

TYPE: bool | None

mineralogy_petrology_fabric

Mineralogy data available.

TYPE: bool | None

clumped_isotopes

Clumped isotope data available.

TYPE: bool | None

noble_gas_temperatures

Noble gas temperature data.

TYPE: bool | None

C14

Radiocarbon data available.

TYPE: bool | None

ODL

Optical dating available.

TYPE: bool | None

RETURNS DESCRIPTION

None

ExternalSpeleothem(entityid=None, externalid=None, extdatabaseid=None, externaldescription=None)

Represents an external reference to a speleothem entity.

This class manages references to speleothem entities in external databases or from other research groups.

ATTRIBUTE DESCRIPTION
entityid

Local entity identifier.

TYPE: int | None

externalid

External entity identifier.

TYPE: int | None

extdatabaseid

External database ID.

TYPE: int | None

externaldescription

Description of external reference.

TYPE: str | None

Examples:

>>> ext = ExternalSpeleothem(entityid=1, externalid="PALEODB-2847")  # Reference to external database
>>> ext.entityid
1
>>> ext = ExternalSpeleothem(entityid=5, externalid="GEOMARC-156", extdatabaseid=3)
>>> ext.externalid
'GEOMARC-156'

insert_externalspeleothem_to_db(cur)

Insert external speleothem reference into database.

PARAMETER DESCRIPTION
cur

Database cursor.

TYPE: cursor

RETURNS DESCRIPTION

None

UThSeries

UThSeries(geochronid=None, decayconstantid=None, ratio230th232th=None, ratiouncertainty230th232th=None, activity230th238u=None, activityuncertainty230th238u=None, activity234u238u=None, activityuncertainty234u238u=None, iniratio230th232th=None, iniratiouncertainty230th232th=None)

Uranium-thorium radiometric dating data in Neotoma.

Stores U/Th isotope ratios and activities measured for radiometric dating, including initial ratios and associated uncertainties.

ATTRIBUTE DESCRIPTION
geochronid

Geochronology ID.

TYPE: int | None

decayconstantid

Decay constant ID.

TYPE: int | None

ratio230th232th

²³⁰Th/²³²Th ratio.

TYPE: float | None

ratiouncertainty230th232th

Ratio uncertainty.

TYPE: float | None

activity230th238u

²³⁰Th/²³⁸U activity.

TYPE: float | None

activityuncertainty230th238u

Activity uncertainty.

TYPE: float | None

activity234u238u

²³⁴U/²³⁸U activity.

TYPE: float | None

activityuncertainty234u238u

Activity uncertainty.

TYPE: float | None

iniratio230th232th

Initial ratio.

TYPE: float | None

iniratiouncertainty230th232th

Initial ratio uncertainty.

TYPE: float | None

Examples:

>>> uth = UThSeries(geochronid=1, ratio230th232th=1.265)
>>> uth.ratio230th232th
1.265

insert_to_db(cur)

Insert U/Th series data into the database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION

None

insert_uraniumseriesdata(cur, dataid, geochronid)

Insert uranium series data linking data to geochronology.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

dataid

Data value identifier.

TYPE: int

geochronid

Geochronological data identifier.

TYPE: int

RETURNS DESCRIPTION

None

Variable

Variable(taxonid=None, variableelementid=None, variableunitsid=None, variablecontextid=None)

A variable (taxon or measurement) in Neotoma.

Defines what is being measured in paleoenvironmental data, including taxon, element, units, and context. Can be taxonomic (species) or physical.

See the Neotoma Manual

ATTRIBUTE DESCRIPTION
varid

Variable ID (assigned after DB lookup/insert).

TYPE: int | None

taxonid

Taxon ID.

TYPE: int | None

variableelementid

Variable element ID.

TYPE: int | None

variableunitsid

Variable units ID.

TYPE: int | None

variablecontextid

Variable context ID.

TYPE: int | None

Examples:

>>> variable = Variable(taxonid=42, variableunitsid=3)
>>> variable.taxonid
42

insert_to_db(cur)

Insert the variable record into the database.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION
int

The variableid assigned by the database.

get_id_from_db(cur)

Retrieve variable ID from the database based on attributes.

PARAMETER DESCRIPTION
cur

Database cursor for executing queries.

TYPE: cursor

RETURNS DESCRIPTION

int, None: The variableid or None if not found.

DataBUS

Validation and insertion modules for the neotomaValidator package. Each function validates the corresponding Neotoma entity and, when a populated databus dict is supplied, also inserts the record into the database within the active transaction.

valid_site

valid_site(cur, yml_dict, csv_file)

Validates and inserts site information for the Neotoma database.

Validates site details including coordinates, name, altitude, and area. Checks if site exists in Neotoma, finds close/matching sites, and compares provided data with existing database records. Handles coordinate validation and hemisphere determination. Always attempts to insert new sites when no matching site is found; the caller (validation_playground.py) controls whether changes are committed via transaction management.

PARAMETER DESCRIPTION
cur

Database connection to Neotoma database.

TYPE: connection

yml_dict

Dictionary containing parameters from YAML configuration.

TYPE: dict

csv_file

List of row dicts from the CSV file.

TYPE: list[dict]

RETURNS DESCRIPTION
Response

Contains validation results, site list, hemisphere info, and messages. response.id_int is set to the matched or newly inserted site ID.

Examples:

>>> valid_site(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)

valid_geopolitical_units

valid_geopolitical_units(cur, yml_dict, csv_file, databus=None)

Validates geopolitical unit assignments and inserts site-geopolitical links when databus is provided.

Validates provided geopolitical units (national_unit, state, county, etc.) by querying the database for matching geopolitical IDs. Returns the most specific (lowest level) valid geopolitical unit found.

When databus is provided and databus["sites"].id_int is available, inserts the site-to-geopolitical-unit associations into ndb.sitegeopoliticalunits for both the national unit and the most specific subregional unit found.

PARAMETER DESCRIPTION
cur

Database connection to Neotoma database.

TYPE: connection

yml_dict

Dictionary containing parameters from YAML configuration.

TYPE: dict

csv_file

List of row dicts from the CSV file.

TYPE: list[dict]

databus

Prior validation results. When not None, uses databus["sites"].id_int to insert site-geopolitical-unit links.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation results and messages.

Examples:

>>> valid_geopolitical_units(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)

valid_collunit

valid_collunit(cur, yml_dict, csv_file, databus=None)

Validates collection unit data for sample collection sites.

Validates collection unit parameters including coordinates, collection date, depositional environment, substrate, and collection device. Handles date parsing, queries database for valid ID values, and detects close/duplicate collection units.

PARAMETER DESCRIPTION
cur

Database connection to Neotoma (local or remote).

TYPE: connection

yml_dict

Dictionary containing data from YAML template.

TYPE: dict

csv_file

Path to CSV file with required data to upload.

TYPE: str

RETURNS DESCRIPTION
Response

Response object with validation results, messages, and collection unit list.

Examples:

>>> valid_collunit(cur, yml_dict, "data.csv")
Response(valid=[True], message=[...])

valid_speleothem

valid_speleothem(cur, yml_dict, csv_file, databus=None)

Validates speleothem data and inserts the record when databus is provided.

Validates speleothem parameters including entity properties, drip type, geology, cover type, and land use information. Queries the database for valid values and creates a Speleothem object with validated parameters.

When databus is provided, uses databus["sites"].id_int as the site ID and inserts the Speleothem record into ndb.speleothems via sp.insert_to_db(cur). The resulting speleothem ID is stored in response.id_int.

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

List of row dicts from the CSV file.

TYPE: list[dict]

databus

Prior validation results. When not None, uses databus["sites"].id_int for the insert. Defaults to None.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, overall status, and the inserted speleothem ID in response.id_int.

Examples:

>>> valid_speleothem(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)

valid_external_speleothem

valid_external_speleothem(cur, yml_dict, csv_file, databus=None)

Validates external speleothem data and inserts the record when databus is provided.

Validates external speleothem parameters including external database ID, external ID, and description. Queries the database for valid external database references and creates ExternalSpeleothem objects.

When databus is provided, uses databus["speleothems"].id_int as the speleothem entity ID and inserts the record via es.insert_externalspeleothem_to_db(cur).

PARAMETER DESCRIPTION
cur

Database cursor to query ndb.externaldatabases table.

TYPE: cursor

yml_dict

Dictionary of configuration parameters from YAML file.

TYPE: dict

csv_file

List of row dicts from the CSV file.

TYPE: list[dict]

databus

Prior validation results. When not None, uses databus["speleothems"].id_int for the insert. Defaults to None.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object with validation results and messages.

Examples:

>>> valid_external_speleothem(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)

valid_analysisunit

valid_analysisunit(cur, yml_dict, csv_file, databus=None)

Validates analysis unit data and inserts into the database when databus is provided.

Validates analysis unit parameters including depth, thickness, facies ID, and other stratigraphic properties. Handles both single and multiple analysis units, creating AnalysisUnit objects with validated parameters.

When databus is provided and databus["collunits"].id_int is a real integer, each AnalysisUnit is inserted into ndb.analysisunits and the resulting IDs are appended to response.id_list.

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

List of dictionaries representing CSV file data.

TYPE: list

databus

Prior validation results. When not None, uses databus["collunits"].id_int as the collectionunitid for inserts.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, overall status, count of created analysis units (counter), and the list of inserted analysis unit IDs (id_list).

Examples:

>>> valid_analysisunit(cur, yml_dict, csv_file)
Response(valid=[True], message=[...], validAll=True, counter=1)

valid_pbmodel

valid_pbmodel(cur, yml_dict, csv_file, databus)

Validates lead-210 dating model parameters and inserts records when databus is provided.

Validates lead model parameters including basis (dating assumption) and cumulative inventory. Creates LeadModel objects for each analysis unit with validated basis ID.

When databus["analysisunits"].id_list is available, resolves analysis unit IDs and inserts each LeadModel record into ndb.leadmodels via lm.insert_to_db(cur).

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

List of row dicts from the CSV file.

TYPE: list[dict]

databus

Prior validation results. Uses databus["analysisunits"].id_list for AU IDs during insert.

TYPE: dict

RETURNS DESCRIPTION
Response

Response object containing validation messages and overall validity status.

Examples:

>>> valid_pbmodel(cursor, config_dict, csv_rows, databus)
Response(valid=[True], message=[...], validAll=True)

valid_dataset

valid_dataset(cur, yml_dict, csv_file, databus=None)

Validates a dataset and inserts it into the database when databus is provided.

Validates dataset name and dataset type against the Neotoma database. Attempts to resolve dataset type by querying the database if not provided. Creates a Dataset object with validated parameters.

When databus is provided, uses databus["collunits"].id_int as the collectionunitid and inserts the dataset into ndb.datasets. The resulting dataset ID is stored in response.id_int.

PARAMETER DESCRIPTION
cur

Database cursor to execute SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

List of row dicts from the CSV file.

TYPE: list[dict]

databus

Prior validation results. When not None, uses databus["collunits"].id_int for the insert.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, overall status, and the inserted dataset ID in response.id_int.

Examples:

>>> valid_dataset(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)

valid_geochron_dataset

valid_geochron_dataset(cur, yml_dict, csv_file, databus=None)

Validates and inserts a geochronological dataset.

Creates and validates a Dataset object with the geochronological dataset type ID (fetched from ndb.datasettypes by name 'geochronologic'). When databus is provided and validation passes, inserts the dataset into the database using the real collection unit ID from databus['collunits'].id_int and stores the resulting dataset ID in response.id_int.

PARAMETER DESCRIPTION
cur

Database cursor to execute SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

Path to CSV file containing geochronological data.

TYPE: str

databus

Prior validation results supplying collectionunitid.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages and overall validity status.

Examples:

>>> valid_geochron_dataset(cursor, config_dict, "geochron_data.csv")
Response(valid=[True], message=[...], validAll=True)

valid_chronologies

valid_chronologies(cur, yml_dict, csv_file, databus=None)

Validates and inserts chronologies for geochronological data.

Validates chronology parameters including age type, contact ID, date prepared, and age bounds. Handles age model conversions (e.g., collection date to years BP) and creates Chronology objects with validated parameters. When databus is provided and all parameters are valid, inserts each chronology into the database using the real collection unit ID from databus['collunits'].id_int and stores the resulting chronology ID in response.id_list

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

List of dictionaries representing CSV file data.

TYPE: list

databus

Prior validation results supplying collectionunitid and (optionally) contactid overrides.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, chronology IDs, and overall status.

Examples:

>>> valid_chronologies(cursor, config_dict, csv_data)
Response(valid=[True], message=[...])

valid_chroncontrols

valid_chroncontrols(cur, yml_dict, csv_file, databus=None)

Validates and inserts chronological control points for age models.

Validates chronology control parameters including depth, age, thickness, and control type. Maps string control types to Neotoma integer IDs, verifies consistency of data dimensions, and creates ChronControl objects.

When databus is provided and all parameters are valid, inserts each control point into the database using: - chronologyid from databus['chronologies'].id_list - analysisunitid values from databus['analysisunits'].id_list The resulting chroncontrol IDs are appended to response.id_list.

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

List of dictionaries representing CSV file data.

TYPE: list

databus

Prior validation results supplying chronology and analysis unit IDs.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, chroncontrol IDs, and overall status.

Examples:

>>> valid_chroncontrols(cursor, config_dict, csv_data)
Response(valid=[True], message=[...], validAll=True)

valid_hiatus

valid_hiatus(cur, yml_dict, csv_file, databus=None)

Validates hiatus data and inserts hiatus records when databus is provided.

Identifies hiatus intervals (stratigraphic gaps) in sample analysis units. Groups consecutive analysis units with hiatus data and creates Hiatus objects spanning from start to end of each hiatus interval.

When databus is provided, resolves cluster indices to real analysis unit IDs using databus["analysisunits"].id_list and inserts each Hiatus record into ndb.hiatuses via hiatus.insert_to_db(cur).

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

List of row dicts from the CSV file.

TYPE: list[dict]

databus

Prior validation results. When not None, uses databus["analysisunits"].id_list to resolve AU IDs for the insert.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages and overall validity status.

Examples:

>>> valid_hiatus(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)

valid_sample

valid_sample(cur, yml_dict, csv_file, databus)

Validates sample data and inserts samples into the database.

Validates sample parameters including taxon information, analysis dates, and preparation methods. Creates Sample objects for each analysis unit with validated parameters.

Uses databus["analysisunits"].id_list for analysis unit IDs and databus["datasets"].id_int for the dataset ID. When analysis unit IDs are available, each Sample is inserted into ndb.samples and the resulting sample IDs are appended to response.id_list.

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

List of row dicts from the CSV file.

TYPE: list[dict]

databus

Prior validation results. Must contain databus["analysisunits"].id_list and databus["datasets"].id_int.

TYPE: dict

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, sample counter (counter), and inserted sample IDs (id_list).

Examples:

>>> valid_sample(cursor, config_dict, csv_rows, databus)
Response(valid=[True], message=[...], validAll=True, counter=5)

valid_sample_age

valid_sample_age(cur, yml_dict, csv_file, databus=None)

Validates sample age data for paleontological samples.

Validates sample age parameters including age values, uncertainty bounds, and age type. Handles date parsing for collection dates, validates age types against database, and creates SampleAge objects for each chronology. When databus is provided and all parameters are valid, inserts each sample age into the database using the real chronology ID from databus['chronologies'] and real sample IDs from databus['samples'].id_list.

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

List of dictionaries representing CSV file data.

TYPE: list

databus

Prior validation results supplying chronologyid and sample IDs for insert.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, and overall status.

Examples:

>>> valid_sample_age(cursor, config_dict, csv_data)
Response(valid=[True], message=[...], validAll=True)

valid_geochron

valid_geochron(cur, yml_dict, csv_file, databus=None)

Validates and inserts geochronological dating data.

Validates geochronology parameters including dating type, age, error bounds, and material dated. When databus is provided and validation passes, inserts each Geochron record using real sample IDs from databus['samples'].id_list and stores the resulting geochronid values in response.id_list for downstream use (e.g. valid_geochroncontrol).

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

Path to CSV file containing geochronology data.

TYPE: str

databus

Prior validation results supplying sample IDs.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, geochronid list, and overall status.

Examples:

>>> valid_geochron(cursor, config_dict, "dating_data.csv")
Response(valid=[True], message=[...], validAll=True)

valid_geochroncontrol

valid_geochroncontrol(cur, databus)

Validates and inserts geochronological control linkage records.

Links each geochron record (ndb.geochronology) to its corresponding chroncontrol record (ndb.chroncontrols) by inserting rows into ndb.geochroncontrols. IDs for both are taken from databus: - databus['chron_controls'].id_list → chroncontrol IDs - databus['geochron'].id_list → geochron IDs

If either list is empty or None the step is skipped gracefully. When the two lists differ in length the function tries to broadcast the shorter list; if neither is a multiple of the other it pairs them up to the shorter length.

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

databus

Dictionary containing 'chron_controls' and 'geochron' Response objects (populated by prior validation steps).

TYPE: dict

RETURNS DESCRIPTION
Response

Response object containing validation messages and overall status.

Examples:

>>> valid_geochroncontrol(cursor, databus)
Response(valid=[True], message=[...], validAll=True)

valid_uth_series

valid_uth_series(cur, yml_dict, csv_file, databus=None)

Validates and inserts uranium-thorium series data for geochronological samples.

Validates U-Th series isotope data including isotope ratios, activities, and associated decay constants. Verifies decay constants exist in the database and creates UThSeries objects with validated parameters.

When databus is provided and geochron IDs are available (databus['geochron'].id_list), replaces the placeholder geochronid values with the real inserted IDs and calls UThSeries.insert_to_db(cur) for each row.

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data with U-Th parameters.

TYPE: dict

csv_file

Path to CSV file containing U-Th series data.

TYPE: str

databus

Prior validation results supplying geochron IDs.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, and overall status.

Examples:

>>> valid_uth_series(cursor, config_dict, "uth_series_data.csv")
Response(valid=[True, True], message=[...], validAll=True)

valid_contact

valid_contact(cur, yml_dict, csv_file, tables=CONTACT_TABLES, databus=None)

Validates contact information against the Neotoma Paleoecology Database.

Validates contact data (contact IDs or names) against the Neotoma database to ensure they exist and are valid. Matches contact names with database records and creates Contact objects with validated parameters. When databus is provided, inserts contacts into the appropriate relational tables: - ndb.collectors → insert_collector(cur, collunitid) - ndb.datasetpis → insert_pi(cur, datasetid) - ndb.datasetprocessor → insert_data_processor(cur, datasetid) - ndb.sampleanalysts → insert_sample_analyst(cur, sampleid) for each sample - ndb.chronologies → contact is stored on the Chronology object (no separate insert)

PARAMETER DESCRIPTION
cur

Cursor pointing to Neotoma database.

TYPE: cursor

yml_dict

Dictionary object from template configuration.

TYPE: dict

csv_file

Path to CSV file containing contact data or user name.

TYPE: str

tables

List of table names to validate contacts for.

TYPE: list DEFAULT: CONTACT_TABLES

databus

Dictionary of prior validation results used for insert IDs.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation results and messages.

Examples:

>>> valid_contact(cursor, yml_dict, "data.csv")
Response(valid=[True, True], message=[...], validAll=True)

valid_dataset_database

valid_dataset_database(cur, yml_dict, databus=None)

Validates dataset-database associations and inserts the link when databus is provided.

Validates the database name provided in YAML configuration against the Neotoma database's constituent databases. Creates a DatasetDatabase object with the validated database ID.

When databus is provided and databus["datasets"].id_int is available, calls ts.insertdatasetdatabase to create the link between the dataset and the constituent database. The database ID is stored in response.id_int.

PARAMETER DESCRIPTION
cur

Database cursor to execute SQL queries.

TYPE: cursor object

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

databus

Prior validation results. When not None, uses databus["datasets"].id_int to insert the dataset-database link.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object with validation results, messages, and the constituent database ID in response.id_int.

Examples:

>>> valid_dataset_database(cursor, config_dict)
Response(valid=[True], message=[...], validAll=True)

valid_data

valid_data(cur, yml_dict, csv_file, databus=None)

Validates paleontological data values against the Neotoma database.

Validates data values and associated variables (taxon, units, element, context). Queries database for valid variable IDs, creates Variable and Datum objects with validated parameters. Supports both long and wide data format.

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data.

TYPE: dict

csv_file

Path to CSV file containing data to validate.

TYPE: str

wide

Flag for wide format data handling. Defaults to False.

TYPE: bool

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, and overall status.

Examples:

>>> valid_data(cursor, config_dict, "data.csv")
Response(valid=[True], message=[...], validAll=True)

valid_datauncertainty

valid_datauncertainty(cur, yml_dict, csv_file, databus=None)

Validates data uncertainty values and inserts records when databus is provided.

Validates uncertainty values, units, and basis information. Queries database for valid uncertainty basis IDs and variable unit IDs, then creates DataUncertainty objects with validated parameters. Supports both long and wide data formats.

When databus is provided and databus["samples"].id_list is available, inserts each DataUncertainty record into ndb.datauncertainties via du.insert_to_db(cur).

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration parameters.

TYPE: dict

csv_file

List of row dicts from the CSV file.

TYPE: list[dict]

databus

Prior validation results. When not None, uses databus["samples"].id_list for sample IDs during insert. Defaults to None.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, and overall status.

Examples:

>>> valid_datauncertainty(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)

valid_publication

valid_publication(cur, yml_dict, csv_file, databus=None)

Validates a publication and links it to the dataset when databus is provided.

Validates publication information by checking against the Neotoma database. Accepts publication ID, DOI, or citation and performs similarity matching when exact matches are not found. Can also validate DOIs against CrossRef API.

When databus is provided and databus["datasets"].id_int is available, calls ts.insertdatasetpublication to link the validated publication to the dataset. The publication ID is stored in response.id_int.

PARAMETER DESCRIPTION
cur

Database cursor to execute SQL commands.

TYPE: cursor

yml_dict

Dictionary containing YAML configuration data with publication parameters.

TYPE: dict

csv_file

List of row dicts from the CSV file.

TYPE: list[dict]

databus

Prior validation results. When not None, uses databus["datasets"].id_int to insert the dataset-publication link.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Response

Response object containing validation messages, validity list, overall validity status, and the publication ID in response.id_int.

Examples:

>>> valid_publication(cursor, config_dict, csv_rows)
Response(valid=[True, True], message=[...], validAll=True)

insert_final

insert_final(cur, databus)

Finalizes a dataset submission by inserting a record into ndb.datasetsubmissions.

This function should be called only after all prior validation steps have passed (validAll is True for every key in databus). It records the submission in the Neotoma database, linking the dataset, database, and contact together with a submission date and a fixed submission type of 6.

PARAMETER DESCRIPTION
cur

Database cursor for executing SQL queries.

databus

Accumulated validation results. Must contain: - databus["datasets"].id_int – ID of the validated dataset. - databus["database"].id_int – ID of the target database. - databus["contacts"].id_int – ID of the submitting contact.

TYPE: dict

RETURNS DESCRIPTION
Response

Response object with validity status and a success/failure message.

Examples:

>>> if all(databus[k].validAll for k in databus):
...     result = insert_final(cur, databus=databus)
Response(valid=[True], message=["✔ Dataset submission has been finalized"])

DataBUS Helpers

Utility functions in the neotomaHelpers package used for parameter extraction, file handling, template parsing, and transaction management.

Parameter Extraction

pull_params

pull_params(params, yml_dict, csv_template, table=None)

Pull and process parameters for database insert statements.

Extracts parameters from YAML template and CSV data, performs type conversions (date, int, float, coordinates, string), handles special cases like notes and chronologies, and returns cleaned data ready for insertion.

PARAMETER DESCRIPTION
params

List of strings for columns needed to generate insert statement.

TYPE: list

yml_dict

Dictionary returned by YAML template containing 'metadata' key.

TYPE: dict

csv_template

CSV data as list of dictionaries with column data to upload.

TYPE: dict

table

Name of the table(s) parameters are drawn for. If list, returns results for each table.

TYPE: str or list DEFAULT: None

name

Name field identifier. Defaults to None.

TYPE: str

values

Whether to treat columns as value columns. Defaults to False.

TYPE: bool

RETURNS DESCRIPTION

dict or list: Cleaned and formatted parameters ready for database insertion. If table is a list, returns list of dicts. If table is str, returns single dict. Returns hierarchical dicts for special tables like chronologies/sampleages.

Template & File Utilities

read_csv

read_csv(filename)

Read CSV file and return a structured list of dictionaries.

Parses a CSV file and converts each row into a dictionary with column headers as keys.

Examples:

>>> read_csv('pollen_data.csv')
[{'depth': '2.5', 'quercus': '125'}, {'depth': '5.0', 'quercus': '142'}]
PARAMETER DESCRIPTION
filename

Path to the CSV file to read.

TYPE: str

RETURNS DESCRIPTION
list

List of dictionaries where each dictionary represents a row, with column headers as keys.

check_file

check_file(filename, strict=False, validation_files='data/logs/')

Checks validation log file for errors from prior validation runs.

Examines validation log files to determine if a CSV file has been successfully validated. Checks both validated and not_validated directories. Counts errors and warnings, removing log file if strict mode passes.

Examples:

>>> check_file("pollen_data.csv", strict=False)
{'pass': True, 'match': 0, 'message': ['No errors found in the last validation.']}
>>> check_file("chronology.csv", strict=True)
{'pass': False, 'match': 2, 'message': ['Errors found in the prior validation.']}
PARAMETER DESCRIPTION
filename

File path or relative path for a template CSV file.

TYPE: str

strict

If True, also count "Valid: FALSE" lines as errors. Defaults to False.

TYPE: bool DEFAULT: False

validation_files

Path to validation logs directory. Defaults to "data/validation_logs/".

TYPE: str DEFAULT: 'data/logs/'

RETURNS DESCRIPTION
dict

Dictionary with 'pass' (bool), 'match' (int error count), and 'message' (list).

hash_file

hash_file(filename, validation_files='data/validation_logs/')

Calculate MD5 hash of a file and compare against validation logs.

Computes the MD5 hash of a file and compares it with previously stored hashes in validation log files to determine if the file has been validated or modified.

Examples:

>>> hash_file('pollen_data.csv')
{'pass': True, 'hash': 'abc123def456...', 'message': ['abc123def456...', 'Hashes match, file hasn't changed.']}
>>> hash_file('chronology_template.xlsx')
{'pass': False, 'hash': 'xyz789abc123...', 'message': ['xyz789abc123...', 'File has changed, validating chronology_template.xlsx.']}
PARAMETER DESCRIPTION
filename

Path to the file to hash.

TYPE: str

validation_files

Path to the validation logs directory. Defaults to 'data/validation_logs/'.

TYPE: str DEFAULT: 'data/validation_logs/'

RETURNS DESCRIPTION
dict

Dictionary with keys: 'pass' (bool): True if file matches validation log hash. 'hash' (str): MD5 hash of the file in hexadecimal. 'message' (list): List of status messages about validation result.

excel_to_yaml

InlineList(data)

Custom class to represent inline lists in YAML output.

Used with custom YAML representer to output lists in flow style (inline) rather than block style.

ATTRIBUTE DESCRIPTION
data

The list data to be represented inline.

Initialize InlineList with data.

PARAMETER DESCRIPTION
data

List or sequence to store.

represent_inline_list(dumper, data)

YAML representer for InlineList objects.

Tells the YAML dumper to output InlineList objects as inline sequences.

PARAMETER DESCRIPTION
dumper

YAML dumper instance.

data

InlineList object to represent.

TYPE: InlineList

RETURNS DESCRIPTION

YAML node representing the sequence in flow style.

excel_to_yaml(temp_file, file_name)

Convert Excel template file to YAML format.

Reads data mapping and metadata from Excel sheets, processes column definitions including units and uncertainty information, and writes formatted YAML output.

Examples:

>>> excel_to_yaml('template.xlsx', 'template')
# Creates template.yml file
PARAMETER DESCRIPTION
temp_file

Path to the Excel template file (.xls or .xlsx).

TYPE: str

file_name

Base filename for output YAML (without extension). Output file will be named file_name.yml

TYPE: str

RETURNS DESCRIPTION
None

Writes YAML file to disk with name file_name.yml

Logging

logging_dict

logging_response(response, logfile)

Append Response object string representation to logfile.

Validates that response is a Response object, then appends its string representation to the logfile.

Examples:

>>> logfile = []
>>> logging_response(pollen_response, logfile)
[<string representation of pollen_response>]
>>> logfile = []
>>> logging_response(chronology_response, logfile)
[<string representation of chronology_response>]
PARAMETER DESCRIPTION
response

Response object from DataBUS module.

TYPE: Response

logfile

List to append the response to.

TYPE: list

RETURNS DESCRIPTION
list

The updated logfile list with response appended.

RAISES DESCRIPTION
AssertionError

If response is not an instance of Response class.