Documentation
This part of the project documentation focuses on an information-oriented approach. Use it as a
reference for the technical implementation of the DataBUS project code.
Core Data Classes
Core classes representing the fundamental data models used throughout the DataBUS project.
AnalysisUnit
AnalysisUnit(analysisunitid=None, collectionunitid=None, analysisunitname=None, depth=None, thickness=None, faciesid=None, mixed=None, igsn=None, notes=None)
An analysis unit in Neotoma.
Physical subsets of a collection unit, often with a position (depth) within a core or dig site. Samples are the intersection between analysis units and dataset types.
See the Neotoma Manual.
| ATTRIBUTE | DESCRIPTION |
|---|---|
analysisunitid |
Analysis unit ID (assigned after insertion).
TYPE:
|
collectionunitid |
Parent collection unit ID.
TYPE:
|
analysisunitname |
Name of the analysis unit.
TYPE:
|
depth |
Depth in the core/site (if known).
TYPE:
|
thickness |
Physical thickness of the sample.
TYPE:
|
faciesid |
Neotoma identifier for rock formation.
TYPE:
|
mixed |
Evidence of stratigraphic mixing.
TYPE:
|
igsn |
IGSN identifier.
TYPE:
|
notes |
Additional notes.
TYPE:
|
Examples:
>>> au = AnalysisUnit(collectionunitid=1, depth=2.5)
>>> au.depth
2.5
insert_to_db(cur)
Insert the AnalysisUnit to Neotoma.
Args:
cur (psycopg2.connect): A valid psycopg2 connection the the Neotoma Database.
Returns:
AnalysisUnit: The function inserts the AnalysisUnit to Neotoma and adds the new analysisunitid to the object.
ChronControl
ChronControl(chroncontrolid=None, chronologyid=None, chroncontroltypeid=None, depth=None, thickness=None, age=None, agelimityounger=None, agelimitolder=None, notes=None, analysisunitid=None, agetypeid=None)
A chronological control point in Neotoma.
Provides dating constraints for a chronology, such as radiocarbon dates or other age measurements at specific depths within a stratigraphic sequence. See the Neotoma Manual.
| ATTRIBUTE | DESCRIPTION |
|---|---|
chroncontrolid |
Control point ID.
TYPE:
|
chronologyid |
Chronology ID.
TYPE:
|
chroncontroltypeid |
Control type ID.
TYPE:
|
depth |
Depth value.
TYPE:
|
thickness |
Thickness value.
TYPE:
|
age |
Age value in years.
TYPE:
|
agelimityounger |
Younger age bound.
TYPE:
|
agelimitolder |
Older age bound.
TYPE:
|
notes |
Additional notes.
TYPE:
|
analysisunitid |
Analysis unit ID.
TYPE:
|
agetypeid |
Age type ID.
TYPE:
|
Examples:
>>> chron = ChronControl(chronologyid=1, depth=5.5, age=75)
>>> chron.age
75
insert_to_db(cur)
Insert the chronological control point into the database. Args: cur (psycopg2.cursor): Database cursor for executing queries. Returns: int: The chroncontrolid assigned by the database.
Chronology
Chronology(chronologyid=None, collectionunitid=None, agetypeid=None, contactid=None, chronologyname=None, dateprepared=None, agemodel=None, ageboundyounger=None, ageboundolder=None, isdefault=None, notes=None)
A chronology (age model) for a collection unit in Neotoma.
Defines the dating framework for samples within a collection unit, including age model type, bounds, and preparation metadata.
See the Neotoma Manual.
| ATTRIBUTE | DESCRIPTION |
|---|---|
chronologyid |
Chronology ID.
TYPE:
|
collectionunitid |
Collection unit ID.
TYPE:
|
agetypeid |
Age type ID.
TYPE:
|
contactid |
Contact ID (first element if list provided).
TYPE:
|
chronologyname |
Chronology name.
TYPE:
|
dateprepared |
Preparation date.
TYPE:
|
agemodel |
Age model description.
TYPE:
|
ageboundyounger |
Younger age bound.
TYPE:
|
ageboundolder |
Older age bound.
TYPE:
|
isdefault |
Whether this is the default chronology for the collection unit.
TYPE:
|
notes |
Additional notes.
TYPE:
|
Examples:
>>> chron = Chronology(collectionunitid=1, chronologyname="Model 2023")
>>> chron.chronologyname
'Model 2023'
insert_to_db(cur)
Insert the chronology record into the Neotoma database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The chronologyid assigned by the database. |
CollectionUnit
CollectionUnit(collectionunitid=None, handle=None, siteid=None, colltypeid=None, depenvtid=None, collunitname=None, colldate=None, colldevice=None, gpsaltitude=None, gpserror=None, waterdepth=None, substrateid=None, slopeaspect=None, slopeangle=None, location=None, notes=None, geog=None)
Represents a sediment core or excavation collection in Neotoma.
A collection unit is a physical collection (e.g., a sediment core, excavation) at a specific site. It contains geographic, temporal, and physical information about the collected material.
Collection units are explained further in the Neotoma Manual.
| ATTRIBUTE | DESCRIPTION |
|---|---|
collectionunitid |
Collection unit identifier.
TYPE:
|
handle |
Unique handle/identifier.
TYPE:
|
siteid |
Associated site ID.
TYPE:
|
colltypeid |
Collection type ID.
TYPE:
|
depenvtid |
Depositional environment ID.
TYPE:
|
collunitname |
Collection unit name.
TYPE:
|
colldate |
Collection date.
TYPE:
|
colldevice |
Collection device used.
TYPE:
|
gpsaltitude |
GPS altitude in meters.
TYPE:
|
gpserror |
GPS error in meters.
TYPE:
|
waterdepth |
Water depth in meters.
TYPE:
|
substrateid |
Substrate type ID.
TYPE:
|
slopeaspect |
Slope aspect in degrees.
TYPE:
|
slopeangle |
Slope angle in degrees.
TYPE:
|
location |
Location description.
TYPE:
|
notes |
Additional notes.
TYPE:
|
geog |
Geographic coordinates.
TYPE:
|
distance |
Distance from reference (computed when
TYPE:
|
Examples:
>>> cu = CollectionUnit(siteid=1, handle="MCL-01") # Mirror Lake core collection
>>> cu.handle
'MCL-01'
>>> cu = CollectionUnit(siteid=2, handle="LC-Core-1", waterdepth=25.5, collunitname="Main core") # Lake cave site
>>> cu.waterdepth
25.5
find_close_collunits(cur, distance=10000, limit=10)
Find geographically close collection units.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
distance
|
Distance threshold in meters.
TYPE:
|
limit
|
Maximum number to return.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
Collection units within specified distance. |
insert_to_db(cur)
Insert the collection unit into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The collectionunitid assigned. |
Contact
Contact(contactid, contactname=None, order=None)
A person who participated in data collection or processing in Neotoma.
Manages contact information and roles in paleoenvironmental research including data processing, sample analysis, and field collection.
| ATTRIBUTE | DESCRIPTION |
|---|---|
contactid |
Contact ID.
TYPE:
|
contactname |
Contact name.
TYPE:
|
order |
Order/sequence in list of contacts.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If contactid is not int or None, or order is not int or None. |
Examples:
>>> contact = Contact(contactid=1, contactname="Simon Goring", order=1)
>>> contact.contactname
'Simon Goring'
insert_pi(cur, datasetid)
Insert contact as principal investigator for a dataset.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
datasetid
|
Dataset identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
insert_data_processor(cur, datasetid)
Insert contact as data processor for a dataset.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
datasetid
|
Dataset identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
insert_sample_analyst(cur, sampleid)
Insert contact as sample analyst.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
sampleid
|
Sample identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
insert_collector(cur, collunitid)
Insert contact as field collector for a collection unit.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
collunitid
|
Collection unit identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Dataset
Dataset(datasettypeid, datasetid=None, collectionunitid=None, datasetname=None, notes=None)
A dataset in Neotoma.
A collection of data (e.g., pollen counts, isotope values) of a specific type associated with a collection unit.
Datasets are explained further in the Neotoma Manual.
| ATTRIBUTE | DESCRIPTION |
|---|---|
datasetid |
Dataset ID (assigned after insertion).
TYPE:
|
collectionunitid |
Collection unit ID.
TYPE:
|
datasettypeid |
Dataset type ID (required).
TYPE:
|
datasetname |
Dataset name.
TYPE:
|
notes |
Additional notes.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If datasettypeid is not an integer. |
Examples:
>>> dataset = Dataset(datasettypeid=1, datasetname="Pollen Core")
>>> dataset.datasettypeid
1
insert_to_db(cur)
Insert the dataset record into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The datasetid assigned by the database. |
DatasetDatabase
DatasetDatabase(databaseid, datasetid=None)
A link between a dataset and a constituent database in Neotoma.
Associates a dataset with a constituent database, enabling tracking of dataset provenance.
| ATTRIBUTE | DESCRIPTION |
|---|---|
databaseid |
Constituent database ID.
TYPE:
|
datasetid |
Dataset ID.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If databaseid or datasetid is not int or None. |
Examples:
>>> ds_db = DatasetDatabase(databaseid=1, datasetid=2)
>>> ds_db.databaseid
1
insert_to_db(cur)
Insert the dataset-database relationship into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
DataUncertainty
DataUncertainty(dataid, uncertaintyvalue, uncertaintyunitid, uncertaintybasisid, notes)
Measurement uncertainty for a data value in Neotoma.
Stores uncertainty metrics including magnitude, units, and basis of uncertainty quantification.
| ATTRIBUTE | DESCRIPTION |
|---|---|
dataid |
Data ID.
TYPE:
|
uncertaintyvalue |
Uncertainty magnitude.
TYPE:
|
uncertaintyunitid |
Uncertainty units ID.
TYPE:
|
uncertaintybasisid |
Uncertainty basis ID.
TYPE:
|
notes |
Notes about uncertainty.
TYPE:
|
Examples:
>>> uncert = DataUncertainty(dataid=1, uncertaintyvalue=5.0,
... uncertaintyunitid=2, uncertaintybasisid=1, notes=None)
>>> uncert.uncertaintyvalue
5.0
insert_to_db(cur)
Insert the data uncertainty record into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Datum
Datum(sampleid=None, variableid=None, value=None)
A data point measurement in the Neotoma database.
Encapsulates a single measurement or observation linking a sample to a variable with a measured value.
Data are explained further in the [Neotoma Manual]https://open.neotomadb.org/manual/sample-related-tables-1.html#Data)
| ATTRIBUTE | DESCRIPTION |
|---|---|
sampleid |
Sample identifier.
TYPE:
|
variableid |
Variable identifier.
TYPE:
|
value |
The measured value.
TYPE:
|
datumid |
Database ID (assigned after insertion).
TYPE:
|
Examples:
>>> datum = Datum(sampleid=1, variableid=42, value=125.3)
>>> datum.value
125.3
insert_to_db(cur)
Insert the datum record into the Neotoma database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The datumid assigned by the database. |
Geochron
Geochron(sampleid=None, geochrontypeid=None, agetypeid=None, age=None, errorolder=None, erroryounger=None, infinite=None, delta13c=None, labnumber=None, materialdated=None, notes=None)
A geochronological age determination in Neotoma.
Stores age measurements from radiometric and other dating techniques, including determined age, uncertainty bounds, and dated material info.
Geochronologies are explained further in the Neotoma Manual
| ATTRIBUTE | DESCRIPTION |
|---|---|
sampleid |
Sample ID.
TYPE:
|
geochrontypeid |
Geochron type ID.
TYPE:
|
agetypeid |
Age type ID.
TYPE:
|
age |
Age value.
TYPE:
|
errorolder |
Older error bound.
TYPE:
|
erroryounger |
Younger error bound.
TYPE:
|
infinite |
Infinite age flag (defaults to False).
TYPE:
|
delta13c |
Delta 13C value (for radiocarbon).
TYPE:
|
labnumber |
Laboratory number.
TYPE:
|
materialdated |
Material dated.
TYPE:
|
notes |
Additional notes.
TYPE:
|
geochronid |
Geochron ID (assigned after insertion).
TYPE:
|
Examples:
>>> geo = Geochron(sampleid=1, geochrontypeid=1, agetypeid=1, age=3250,
... errorolder=100, erroryounger=100, infinite=False,
... delta13c=-25.5, labnumber="UCIAMS-12345",
... materialdated="Charcoal", notes=None)
>>> geo.age
3250
insert_to_db(cur)
Insert the geochronological date into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The geochronid assigned by the database. |
GeochronControl
GeochronControl(chroncontrolid, geochronid)
A link between a chronological control and geochronological date.
Links a chronological control point with a geochronological age determination used to constrain the age model.
Geochronologies are explained further in the Neotoma Manual
| ATTRIBUTE | DESCRIPTION |
|---|---|
chroncontrolid |
Chrono control ID.
TYPE:
|
geochronid |
Geochron ID.
TYPE:
|
geochroncontrolid |
Geochron control ID (assigned after insertion).
TYPE:
|
Examples:
>>> gc = GeochronControl(chroncontrolid=1, geochronid=2)
>>> gc.chroncontrolid
1
insert_to_db(cur)
Insert the geochron-control relationship into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The geochroncontrolid assigned by the database. |
Geog
WrongCoordinates
Bases: Exception
Custom exception raised when coordinates are outside valid geographic ranges.
Geog(coords)
Geographic coordinates with validation and hemisphere determination.
Stores latitude and longitude with validation to ensure values are within valid geographic ranges. Automatically determines hemisphere from coordinates.
| ATTRIBUTE | DESCRIPTION |
|---|---|
longe |
Longitude in decimal degrees (-180 to 180).
TYPE:
|
latn |
Latitude in decimal degrees (-90 to 90).
TYPE:
|
longw |
Longitude in decimal degrees (-180 to 180).
TYPE:
|
lats |
Latitude in decimal degrees (-90 to 90).
TYPE:
|
hemisphere |
Cardinal directions ('NE', 'NW', 'SE', 'SW').
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If coords is not list/tuple/None, or lat/long not numbers. |
ValueError
|
If coords length is not 2. |
WrongCoordinates
|
If coordinates outside valid ranges. |
Examples:
>>> geog = Geog([43.3734, -71.5316, 43.3734, -71.5316])
>>> geog.hemisphere
'NW'
Hiatus
Hiatus(hiatusid=None, analysisunitstart=None, analysisunitend=None, notes=None)
A hiatus or stratigraphic gap in a sediment sequence.
Marks a discontinuity in the stratigraphic record, representing missing time. Bounded by analysis units and can be associated with a chronology.
| ATTRIBUTE | DESCRIPTION |
|---|---|
hiatusid |
Hiatus ID.
TYPE:
|
analysisunitstart |
Start analysis unit ID.
TYPE:
|
analysisunitend |
End analysis unit ID.
TYPE:
|
notes |
Description (first element if list provided).
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If hiatusid is not int/None/"NA", analysis units not int, or notes not str/None. |
Examples:
>>> hiatus = Hiatus(analysisunitstart=10, analysisunitend=15)
>>> hiatus.analysisunitstart
10
insert_to_db(cur)
Insert the hiatus record into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The hiatusid assigned by the database. |
insert_hiatus_chron_to_db(chronologyid, hiatuslength, hiatusuncertainty, cur)
Insert hiatus-chronology relationship into the database.
| PARAMETER | DESCRIPTION |
|---|---|
chronologyid
|
Associated chronology identifier.
TYPE:
|
hiatuslength
|
Length of hiatus.
TYPE:
|
hiatusuncertainty
|
Uncertainty in hiatus length.
TYPE:
|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
LeadModel
LeadModel(pbbasisid=None, analysisunitid=None, cumulativeinventory=None, datinghorizon=None)
A Lead-210 geochronological model in Neotoma.
Manages lead isotope data for radiometric dating of sediment cores, storing basis information and cumulative inventory values.
| ATTRIBUTE | DESCRIPTION |
|---|---|
pbbasisid |
Lead isotope basis ID.
TYPE:
|
analysisunitid |
Analysis unit ID.
TYPE:
|
cumulativeinventory |
Cumulative inventory (Bq/cm²).
TYPE:
|
datinghorizon |
Depth of the dating horizon (cm).
TYPE:
|
Examples:
>>> lead_model = LeadModel(pbbasisid=1, analysisunitid=2, cumulativeinventory=145.3)
>>> lead_model.cumulativeinventory
145.3
insert_to_db(cur)
Insert the Lead model record into the Neotoma database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Response
Response()
Base response class for handling validation and messaging.
This class provides a standard structure for returning validation results and messages from database operations. It includes attributes for tracking validity, messages, and associated database IDs.
Examples:
>>> response = Response(valid=[True], message=["Pollen dataset validated successfully"])
>>> response.validAll
True
>>> response = Response(valid=[True, True, False], message=["Data validation error"])
>>> len(response.valid)
3
| PARAMETER | DESCRIPTION |
|---|---|
valid
|
List of validation boolean values.
TYPE:
|
message
|
List of message strings.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
valid |
Validation status values.
TYPE:
|
message |
Message strings.
TYPE:
|
validAll |
Overall validation status.
TYPE:
|
id_int |
Associated ID.
TYPE:
|
id_list |
Associated IDs.
TYPE:
|
id_dict |
Mapping of data identifiers.
TYPE:
|
name |
Name mapping dictionary.
TYPE:
|
indices |
List of indices.
TYPE:
|
Initialize a Response object.
validAll
property
True if valid is a non-empty list of booleans and all are True. False otherwise.
Sample
Sample(analysisunitid=None, datasetid=None, samplename=None, sampledate=None, analysisdate=None, taxonid=None, labnumber=None, prepmethod=None, notes=None)
A sample in Neotoma.
The intersection between an analysis unit and a dataset, representing physical material analyzed with associated metadata.
See the Neotoma Manual
| ATTRIBUTE | DESCRIPTION |
|---|---|
analysisunitid |
Analysis unit ID.
TYPE:
|
datasetid |
Dataset ID.
TYPE:
|
samplename |
Sample name (first element if list provided).
TYPE:
|
sampledate |
Collection date.
TYPE:
|
analysisdate |
Analysis date.
TYPE:
|
taxonid |
Taxon ID.
TYPE:
|
labnumber |
Laboratory number.
TYPE:
|
prepmethod |
Preparation method.
TYPE:
|
notes |
Additional notes.
TYPE:
|
sampleid |
Sample ID (assigned after insertion).
TYPE:
|
Examples:
>>> sample = Sample(analysisunitid=1, samplename="Pollen-2cm")
>>> sample.samplename
'Pollen-2cm'
insert_to_db(cur)
Insert the sample record into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The sampleid assigned by the database. |
SampleAge
SampleAge(sampleid=None, chronologyid=None, age=None, ageyounger=None, ageolder=None)
Age information for a sample in the Neotoma database.
Stores age estimates for a sample within a specific chronology, including age bounds.
See the Neotoma Manual
| ATTRIBUTE | DESCRIPTION |
|---|---|
sampleid |
Sample identifier. Assigned after insertion.
TYPE:
|
chronologyid |
Chronology identifier.
TYPE:
|
age |
Age estimate.
TYPE:
|
ageyounger |
Younger age bound.
TYPE:
|
ageolder |
Older age bound.
TYPE:
|
Examples:
>>> sample_age = SampleAge(chronologyid=2, age=75)
>>> sample_age.age
75
insert_to_db(cur)
Insert the sample age record into the Neotoma database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The sampleage ID assigned by the database. |
Site
Site(siteid=None, sitename=None, altitude=None, area=None, sitedescription=None, notes=None, geog=None)
Represents a geographic site location in Neotoma.
A site is a geographic location where paleoenvironmental data has been collected. It contains geographic coordinates and descriptive information.
See the Neotoma Manual.
| ATTRIBUTE | DESCRIPTION |
|---|---|
description |
Class description.
TYPE:
|
siteid |
Site identifier.
TYPE:
|
sitename |
Site name (required).
TYPE:
|
altitude |
Elevation in meters.
TYPE:
|
area |
Site area in square kilometers.
TYPE:
|
sitedescription |
Detailed description.
TYPE:
|
notes |
Additional notes.
TYPE:
|
geog |
Geographic coordinates.
TYPE:
|
distance |
Distance from reference (computed).
TYPE:
|
Examples:
>>> site = Site(sitename="Mirror Lake", geog=Geog([43.3734, -71.5316])) # Mirror Lake, NH
>>> site.sitename
'Mirror Lake'
>>> site = Site(sitename="Crater Lake", altitude=1949, geog=Geog([42.9453, -122.1103]))
>>> site.altitude
1949
insert_to_db(cur)
Insert the site into the Neotoma database. Args: cur (psycopg2.cursor): Database cursor. Returns: int: The siteid assigned by the database.
upsert_to_db(cur)
Updates a site that already exists in the database. Args: cur (psycopg2.cursor): Database cursor. Returns: int: The siteid.
find_close_sites(cur, dist=10000, limit=5)
Find geographically close sites using PostGIS distance. Args: cur (psycopg2.cursor): Database cursor. dist (float): Distance threshold in meters (default 10km). limit (int): Maximum number of sites to return (default 5). Returns: list: Tuples of site records ordered by distance.
update_site(other, overwrite, siteresponse=None)
Update site attributes from another site object. Args: other (Site): Source site for updating. overwrite (dict): Dictionary specifying which attributes to overwrite. siteresponse (SiteResponse | None): Response object for tracking changes. Returns: Site: Updated site object.
compare_site(other)
Compare site attributes with another site object. Args: other (Site): Site object to compare against. Returns: list: List of differences found between sites.
Speleothem
Speleothem(siteid=None, entityid=None, entityname=None, monitoring=None, rockageid=None, entrancedistance=None, entrancedistanceunits=None, speleothemtypeid=None)
Represents a speleothem (stalactite, stalagmite, flowstone, etc.) in a cave in Neotoma.
This class manages information about cave mineral deposits that may be sampled for paleoenvironmental reconstruction through isotopic and geochemical analysis.
| ATTRIBUTE | DESCRIPTION |
|---|---|
siteid |
Site ID.
TYPE:
|
entityid |
Entity ID.
TYPE:
|
entityname |
Name.
TYPE:
|
monitoring |
Monitoring flag.
TYPE:
|
rockageid |
Rock age ID.
TYPE:
|
entrancedistance |
Distance from entrance.
TYPE:
|
entrancedistanceunits |
Distance units.
TYPE:
|
speleothemtypeid |
Speleothem type ID.
TYPE:
|
Examples:
>>> spel = Speleothem(siteid=1, entityname="Palace Chandelier", speleothemtypeid=1) # Stalactite from Lehman Cave
>>> spel.entityname
'Palace Chandelier'
>>> spel = Speleothem(siteid=2, entityname="Main Stalagmite", speleothemtypeid=2, entrancedistance=45.5)
>>> spel.entrancedistance
45.5
insert_to_db(cur)
Insert the speleothem record into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The speleothem ID assigned. |
insert_entitygeology_to_db(cur, id, speleothemgeologyid, notes)
Insert speleothem geology information into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
id
|
Entity identifier.
TYPE:
|
speleothemgeologyid
|
Geology type ID.
TYPE:
|
notes
|
Notes.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
insert_entitydripheight_to_db(cur, id, speleothemdriptypeid, entitydripheight, entitydripheightunit)
Insert drip rate information for a speleothem.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
id
|
Entity identifier.
TYPE:
|
speleothemdriptypeid
|
Drip type ID.
TYPE:
|
entitydripheight
|
Drip height measurement.
TYPE:
|
entitydripheightunit
|
Unit ID for height.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
insert_entitycovers_to_db(cur, id, entitycoverid, entitycoverthickness, entitycoverunits)
Insert cover information for a speleothem.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
id
|
Entity identifier.
TYPE:
|
entitycoverid
|
Cover type ID.
TYPE:
|
entitycoverthickness
|
Cover thickness.
TYPE:
|
entitycoverunits
|
Unit ID for thickness.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
insert_entitylandusecovers_to_db(cur, id, landusecovertypeid, landusecoverpercent, landusecovernotes)
Insert land use cover information for a speleothem.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
id
|
Entity identifier.
TYPE:
|
landusecovertypeid
|
Land use type ID.
TYPE:
|
landusecoverpercent
|
Percentage coverage.
TYPE:
|
landusecovernotes
|
Notes.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
insert_entityvegetationcovers_to_db(cur, id, vegetationcovertypeid, vegetationcoverpercent, vegetationcovernotes)
Insert vegetation cover information for a speleothem.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
id
|
Entity identifier.
TYPE:
|
vegetationcovertypeid
|
Vegetation type ID.
TYPE:
|
vegetationcoverpercent
|
Percentage coverage.
TYPE:
|
vegetationcovernotes
|
Notes.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
insert_entitysamples_to_db(cur, id, organics, fluid_inclusions, mineralogy_petrology_fabric, clumped_isotopes, noble_gas_temperatures, C14, ODL)
Insert sample type information for a speleothem.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
id
|
Entity identifier.
TYPE:
|
organics
|
Organic material present.
TYPE:
|
fluid_inclusions
|
Fluid inclusions present.
TYPE:
|
mineralogy_petrology_fabric
|
Mineralogy data available.
TYPE:
|
clumped_isotopes
|
Clumped isotope data available.
TYPE:
|
noble_gas_temperatures
|
Noble gas temperature data.
TYPE:
|
C14
|
Radiocarbon data available.
TYPE:
|
ODL
|
Optical dating available.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
ExternalSpeleothem(entityid=None, externalid=None, extdatabaseid=None, externaldescription=None)
Represents an external reference to a speleothem entity.
This class manages references to speleothem entities in external databases or from other research groups.
| ATTRIBUTE | DESCRIPTION |
|---|---|
entityid |
Local entity identifier.
TYPE:
|
externalid |
External entity identifier.
TYPE:
|
extdatabaseid |
External database ID.
TYPE:
|
externaldescription |
Description of external reference.
TYPE:
|
Examples:
>>> ext = ExternalSpeleothem(entityid=1, externalid="PALEODB-2847") # Reference to external database
>>> ext.entityid
1
>>> ext = ExternalSpeleothem(entityid=5, externalid="GEOMARC-156", extdatabaseid=3)
>>> ext.externalid
'GEOMARC-156'
insert_externalspeleothem_to_db(cur)
Insert external speleothem reference into database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
UThSeries
UThSeries(geochronid=None, decayconstantid=None, ratio230th232th=None, ratiouncertainty230th232th=None, activity230th238u=None, activityuncertainty230th238u=None, activity234u238u=None, activityuncertainty234u238u=None, iniratio230th232th=None, iniratiouncertainty230th232th=None)
Uranium-thorium radiometric dating data in Neotoma.
Stores U/Th isotope ratios and activities measured for radiometric dating, including initial ratios and associated uncertainties.
| ATTRIBUTE | DESCRIPTION |
|---|---|
geochronid |
Geochronology ID.
TYPE:
|
decayconstantid |
Decay constant ID.
TYPE:
|
ratio230th232th |
²³⁰Th/²³²Th ratio.
TYPE:
|
ratiouncertainty230th232th |
Ratio uncertainty.
TYPE:
|
activity230th238u |
²³⁰Th/²³⁸U activity.
TYPE:
|
activityuncertainty230th238u |
Activity uncertainty.
TYPE:
|
activity234u238u |
²³⁴U/²³⁸U activity.
TYPE:
|
activityuncertainty234u238u |
Activity uncertainty.
TYPE:
|
iniratio230th232th |
Initial ratio.
TYPE:
|
iniratiouncertainty230th232th |
Initial ratio uncertainty.
TYPE:
|
Examples:
>>> uth = UThSeries(geochronid=1, ratio230th232th=1.265)
>>> uth.ratio230th232th
1.265
insert_to_db(cur)
Insert U/Th series data into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
insert_uraniumseriesdata(cur, dataid, geochronid)
Insert uranium series data linking data to geochronology.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
dataid
|
Data value identifier.
TYPE:
|
geochronid
|
Geochronological data identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None |
Variable
Variable(taxonid=None, variableelementid=None, variableunitsid=None, variablecontextid=None)
A variable (taxon or measurement) in Neotoma.
Defines what is being measured in paleoenvironmental data, including taxon, element, units, and context. Can be taxonomic (species) or physical.
See the Neotoma Manual
| ATTRIBUTE | DESCRIPTION |
|---|---|
varid |
Variable ID (assigned after DB lookup/insert).
TYPE:
|
taxonid |
Taxon ID.
TYPE:
|
variableelementid |
Variable element ID.
TYPE:
|
variableunitsid |
Variable units ID.
TYPE:
|
variablecontextid |
Variable context ID.
TYPE:
|
Examples:
>>> variable = Variable(taxonid=42, variableunitsid=3)
>>> variable.taxonid
42
insert_to_db(cur)
Insert the variable record into the database.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
The variableid assigned by the database. |
get_id_from_db(cur)
Retrieve variable ID from the database based on attributes.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing queries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
int, None: The variableid or None if not found. |
DataBUS
Validation and insertion modules for the neotomaValidator package. Each function validates the
corresponding Neotoma entity and, when a populated databus dict is supplied, also inserts the
record into the database within the active transaction.
valid_site
valid_site(cur, yml_dict, csv_file)
Validates and inserts site information for the Neotoma database.
Validates site details including coordinates, name, altitude, and area. Checks if site exists in Neotoma, finds close/matching sites, and compares provided data with existing database records. Handles coordinate validation and hemisphere determination. Always attempts to insert new sites when no matching site is found; the caller (validation_playground.py) controls whether changes are committed via transaction management.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database connection to Neotoma database.
TYPE:
|
yml_dict
|
Dictionary containing parameters from YAML configuration.
TYPE:
|
csv_file
|
List of row dicts from the CSV file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Contains validation results, site list, hemisphere info, and messages.
|
Examples:
>>> valid_site(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)
valid_geopolitical_units
valid_geopolitical_units(cur, yml_dict, csv_file, databus=None)
Validates geopolitical unit assignments and inserts site-geopolitical links when databus is provided.
Validates provided geopolitical units (national_unit, state, county, etc.) by querying the database for matching geopolitical IDs. Returns the most specific (lowest level) valid geopolitical unit found.
When databus is provided and databus["sites"].id_int is available,
inserts the site-to-geopolitical-unit associations into ndb.sitegeopoliticalunits
for both the national unit and the most specific subregional unit found.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database connection to Neotoma database.
TYPE:
|
yml_dict
|
Dictionary containing parameters from YAML configuration.
TYPE:
|
csv_file
|
List of row dicts from the CSV file.
TYPE:
|
databus
|
Prior validation results. When not None, uses
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation results and messages. |
Examples:
>>> valid_geopolitical_units(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)
valid_collunit
valid_collunit(cur, yml_dict, csv_file, databus=None)
Validates collection unit data for sample collection sites.
Validates collection unit parameters including coordinates, collection date, depositional environment, substrate, and collection device. Handles date parsing, queries database for valid ID values, and detects close/duplicate collection units.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database connection to Neotoma (local or remote).
TYPE:
|
yml_dict
|
Dictionary containing data from YAML template.
TYPE:
|
csv_file
|
Path to CSV file with required data to upload.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object with validation results, messages, and collection unit list. |
Examples:
>>> valid_collunit(cur, yml_dict, "data.csv")
Response(valid=[True], message=[...])
valid_speleothem
valid_speleothem(cur, yml_dict, csv_file, databus=None)
Validates speleothem data and inserts the record when databus is provided.
Validates speleothem parameters including entity properties, drip type, geology, cover type, and land use information. Queries the database for valid values and creates a Speleothem object with validated parameters.
When databus is provided, uses databus["sites"].id_int as the site ID
and inserts the Speleothem record into ndb.speleothems via
sp.insert_to_db(cur). The resulting speleothem ID is stored in
response.id_int.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
List of row dicts from the CSV file.
TYPE:
|
databus
|
Prior validation results. When not None, uses
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list,
overall status, and the inserted speleothem ID in |
Examples:
>>> valid_speleothem(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)
valid_external_speleothem
valid_external_speleothem(cur, yml_dict, csv_file, databus=None)
Validates external speleothem data and inserts the record when databus is provided.
Validates external speleothem parameters including external database ID, external ID, and description. Queries the database for valid external database references and creates ExternalSpeleothem objects.
When databus is provided, uses databus["speleothems"].id_int as the
speleothem entity ID and inserts the record via
es.insert_externalspeleothem_to_db(cur).
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor to query ndb.externaldatabases table.
TYPE:
|
yml_dict
|
Dictionary of configuration parameters from YAML file.
TYPE:
|
csv_file
|
List of row dicts from the CSV file.
TYPE:
|
databus
|
Prior validation results. When not None, uses
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object with validation results and messages. |
Examples:
>>> valid_external_speleothem(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)
valid_analysisunit
valid_analysisunit(cur, yml_dict, csv_file, databus=None)
Validates analysis unit data and inserts into the database when databus is provided.
Validates analysis unit parameters including depth, thickness, facies ID, and other stratigraphic properties. Handles both single and multiple analysis units, creating AnalysisUnit objects with validated parameters.
When databus is provided and databus["collunits"].id_int is a real
integer, each AnalysisUnit is inserted into ndb.analysisunits and the
resulting IDs are appended to response.id_list.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
List of dictionaries representing CSV file data.
TYPE:
|
databus
|
Prior validation results. When not None, uses
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list,
overall status, count of created analysis units ( |
Examples:
>>> valid_analysisunit(cur, yml_dict, csv_file)
Response(valid=[True], message=[...], validAll=True, counter=1)
valid_pbmodel
valid_pbmodel(cur, yml_dict, csv_file, databus)
Validates lead-210 dating model parameters and inserts records when databus is provided.
Validates lead model parameters including basis (dating assumption) and cumulative inventory. Creates LeadModel objects for each analysis unit with validated basis ID.
When databus["analysisunits"].id_list is available, resolves analysis unit IDs
and inserts each LeadModel record into ndb.leadmodels via lm.insert_to_db(cur).
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
List of row dicts from the CSV file.
TYPE:
|
databus
|
Prior validation results. Uses
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages and overall validity status. |
Examples:
>>> valid_pbmodel(cursor, config_dict, csv_rows, databus)
Response(valid=[True], message=[...], validAll=True)
valid_dataset
valid_dataset(cur, yml_dict, csv_file, databus=None)
Validates a dataset and inserts it into the database when databus is provided.
Validates dataset name and dataset type against the Neotoma database. Attempts to resolve dataset type by querying the database if not provided. Creates a Dataset object with validated parameters.
When databus is provided, uses databus["collunits"].id_int as the
collectionunitid and inserts the dataset into ndb.datasets. The resulting
dataset ID is stored in response.id_int.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor to execute SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
List of row dicts from the CSV file.
TYPE:
|
databus
|
Prior validation results. When not None, uses
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list,
overall status, and the inserted dataset ID in |
Examples:
>>> valid_dataset(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)
valid_geochron_dataset
valid_geochron_dataset(cur, yml_dict, csv_file, databus=None)
Validates and inserts a geochronological dataset.
Creates and validates a Dataset object with the geochronological dataset type ID (fetched from ndb.datasettypes by name 'geochronologic'). When databus is provided and validation passes, inserts the dataset into the database using the real collection unit ID from databus['collunits'].id_int and stores the resulting dataset ID in response.id_int.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor to execute SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
Path to CSV file containing geochronological data.
TYPE:
|
databus
|
Prior validation results supplying collectionunitid.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages and overall validity status. |
Examples:
>>> valid_geochron_dataset(cursor, config_dict, "geochron_data.csv")
Response(valid=[True], message=[...], validAll=True)
valid_chronologies
valid_chronologies(cur, yml_dict, csv_file, databus=None)
Validates and inserts chronologies for geochronological data.
Validates chronology parameters including age type, contact ID, date prepared, and age bounds. Handles age model conversions (e.g., collection date to years BP) and creates Chronology objects with validated parameters. When databus is provided and all parameters are valid, inserts each chronology into the database using the real collection unit ID from databus['collunits'].id_int and stores the resulting chronology ID in response.id_list
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
List of dictionaries representing CSV file data.
TYPE:
|
databus
|
Prior validation results supplying collectionunitid and (optionally) contactid overrides.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list, chronology IDs, and overall status. |
Examples:
>>> valid_chronologies(cursor, config_dict, csv_data)
Response(valid=[True], message=[...])
valid_chroncontrols
valid_chroncontrols(cur, yml_dict, csv_file, databus=None)
Validates and inserts chronological control points for age models.
Validates chronology control parameters including depth, age, thickness, and control type. Maps string control types to Neotoma integer IDs, verifies consistency of data dimensions, and creates ChronControl objects.
When databus is provided and all parameters are valid, inserts each control point into the database using: - chronologyid from databus['chronologies'].id_list - analysisunitid values from databus['analysisunits'].id_list The resulting chroncontrol IDs are appended to response.id_list.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
List of dictionaries representing CSV file data.
TYPE:
|
databus
|
Prior validation results supplying chronology and analysis unit IDs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list, chroncontrol IDs, and overall status. |
Examples:
>>> valid_chroncontrols(cursor, config_dict, csv_data)
Response(valid=[True], message=[...], validAll=True)
valid_hiatus
valid_hiatus(cur, yml_dict, csv_file, databus=None)
Validates hiatus data and inserts hiatus records when databus is provided.
Identifies hiatus intervals (stratigraphic gaps) in sample analysis units. Groups consecutive analysis units with hiatus data and creates Hiatus objects spanning from start to end of each hiatus interval.
When databus is provided, resolves cluster indices to real analysis unit IDs
using databus["analysisunits"].id_list and inserts each Hiatus record into
ndb.hiatuses via hiatus.insert_to_db(cur).
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
List of row dicts from the CSV file.
TYPE:
|
databus
|
Prior validation results. When not None, uses
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages and overall validity status. |
Examples:
>>> valid_hiatus(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)
valid_sample
valid_sample(cur, yml_dict, csv_file, databus)
Validates sample data and inserts samples into the database.
Validates sample parameters including taxon information, analysis dates, and preparation methods. Creates Sample objects for each analysis unit with validated parameters.
Uses databus["analysisunits"].id_list for analysis unit IDs and
databus["datasets"].id_int for the dataset ID. When analysis unit IDs are
available, each Sample is inserted into ndb.samples and the resulting sample
IDs are appended to response.id_list.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
List of row dicts from the CSV file.
TYPE:
|
databus
|
Prior validation results. Must contain
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list,
sample counter ( |
Examples:
>>> valid_sample(cursor, config_dict, csv_rows, databus)
Response(valid=[True], message=[...], validAll=True, counter=5)
valid_sample_age
valid_sample_age(cur, yml_dict, csv_file, databus=None)
Validates sample age data for paleontological samples.
Validates sample age parameters including age values, uncertainty bounds, and age type. Handles date parsing for collection dates, validates age types against database, and creates SampleAge objects for each chronology. When databus is provided and all parameters are valid, inserts each sample age into the database using the real chronology ID from databus['chronologies'] and real sample IDs from databus['samples'].id_list.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
List of dictionaries representing CSV file data.
TYPE:
|
databus
|
Prior validation results supplying chronologyid and sample IDs for insert.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list, and overall status. |
Examples:
>>> valid_sample_age(cursor, config_dict, csv_data)
Response(valid=[True], message=[...], validAll=True)
valid_geochron
valid_geochron(cur, yml_dict, csv_file, databus=None)
Validates and inserts geochronological dating data.
Validates geochronology parameters including dating type, age, error bounds, and material dated. When databus is provided and validation passes, inserts each Geochron record using real sample IDs from databus['samples'].id_list and stores the resulting geochronid values in response.id_list for downstream use (e.g. valid_geochroncontrol).
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
Path to CSV file containing geochronology data.
TYPE:
|
databus
|
Prior validation results supplying sample IDs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list, geochronid list, and overall status. |
Examples:
>>> valid_geochron(cursor, config_dict, "dating_data.csv")
Response(valid=[True], message=[...], validAll=True)
valid_geochroncontrol
valid_geochroncontrol(cur, databus)
Validates and inserts geochronological control linkage records.
Links each geochron record (ndb.geochronology) to its corresponding chroncontrol record (ndb.chroncontrols) by inserting rows into ndb.geochroncontrols. IDs for both are taken from databus: - databus['chron_controls'].id_list → chroncontrol IDs - databus['geochron'].id_list → geochron IDs
If either list is empty or None the step is skipped gracefully. When the two lists differ in length the function tries to broadcast the shorter list; if neither is a multiple of the other it pairs them up to the shorter length.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
|
databus
|
Dictionary containing 'chron_controls' and 'geochron' Response objects (populated by prior validation steps).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages and overall status. |
Examples:
>>> valid_geochroncontrol(cursor, databus)
Response(valid=[True], message=[...], validAll=True)
valid_uth_series
valid_uth_series(cur, yml_dict, csv_file, databus=None)
Validates and inserts uranium-thorium series data for geochronological samples.
Validates U-Th series isotope data including isotope ratios, activities, and associated decay constants. Verifies decay constants exist in the database and creates UThSeries objects with validated parameters.
When databus is provided and geochron IDs are available (databus['geochron'].id_list), replaces the placeholder geochronid values with the real inserted IDs and calls UThSeries.insert_to_db(cur) for each row.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data with U-Th parameters.
TYPE:
|
csv_file
|
Path to CSV file containing U-Th series data.
TYPE:
|
databus
|
Prior validation results supplying geochron IDs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list, and overall status. |
Examples:
>>> valid_uth_series(cursor, config_dict, "uth_series_data.csv")
Response(valid=[True, True], message=[...], validAll=True)
valid_contact
valid_contact(cur, yml_dict, csv_file, tables=CONTACT_TABLES, databus=None)
Validates contact information against the Neotoma Paleoecology Database.
Validates contact data (contact IDs or names) against the Neotoma database to ensure they exist and are valid. Matches contact names with database records and creates Contact objects with validated parameters. When databus is provided, inserts contacts into the appropriate relational tables: - ndb.collectors → insert_collector(cur, collunitid) - ndb.datasetpis → insert_pi(cur, datasetid) - ndb.datasetprocessor → insert_data_processor(cur, datasetid) - ndb.sampleanalysts → insert_sample_analyst(cur, sampleid) for each sample - ndb.chronologies → contact is stored on the Chronology object (no separate insert)
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Cursor pointing to Neotoma database.
TYPE:
|
yml_dict
|
Dictionary object from template configuration.
TYPE:
|
csv_file
|
Path to CSV file containing contact data or user name.
TYPE:
|
tables
|
List of table names to validate contacts for.
TYPE:
|
databus
|
Dictionary of prior validation results used for insert IDs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation results and messages. |
Examples:
>>> valid_contact(cursor, yml_dict, "data.csv")
Response(valid=[True, True], message=[...], validAll=True)
valid_dataset_database
valid_dataset_database(cur, yml_dict, databus=None)
Validates dataset-database associations and inserts the link when databus is provided.
Validates the database name provided in YAML configuration against the Neotoma database's constituent databases. Creates a DatasetDatabase object with the validated database ID.
When databus is provided and databus["datasets"].id_int is available,
calls ts.insertdatasetdatabase to create the link between the dataset and the
constituent database. The database ID is stored in response.id_int.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor to execute SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
databus
|
Prior validation results. When not None, uses
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object with validation results, messages, and
the constituent database ID in |
Examples:
>>> valid_dataset_database(cursor, config_dict)
Response(valid=[True], message=[...], validAll=True)
valid_data
valid_data(cur, yml_dict, csv_file, databus=None)
Validates paleontological data values against the Neotoma database.
Validates data values and associated variables (taxon, units, element, context). Queries database for valid variable IDs, creates Variable and Datum objects with validated parameters. Supports both long and wide data format.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data.
TYPE:
|
csv_file
|
Path to CSV file containing data to validate.
TYPE:
|
wide
|
Flag for wide format data handling. Defaults to False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list, and overall status. |
Examples:
>>> valid_data(cursor, config_dict, "data.csv")
Response(valid=[True], message=[...], validAll=True)
valid_datauncertainty
valid_datauncertainty(cur, yml_dict, csv_file, databus=None)
Validates data uncertainty values and inserts records when databus is provided.
Validates uncertainty values, units, and basis information. Queries database for valid uncertainty basis IDs and variable unit IDs, then creates DataUncertainty objects with validated parameters. Supports both long and wide data formats.
When databus is provided and databus["samples"].id_list is available,
inserts each DataUncertainty record into ndb.datauncertainties via
du.insert_to_db(cur).
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration parameters.
TYPE:
|
csv_file
|
List of row dicts from the CSV file.
TYPE:
|
databus
|
Prior validation results. When not None, uses
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list, and overall status. |
Examples:
>>> valid_datauncertainty(cursor, config_dict, csv_rows)
Response(valid=[True], message=[...], validAll=True)
valid_publication
valid_publication(cur, yml_dict, csv_file, databus=None)
Validates a publication and links it to the dataset when databus is provided.
Validates publication information by checking against the Neotoma database. Accepts publication ID, DOI, or citation and performs similarity matching when exact matches are not found. Can also validate DOIs against CrossRef API.
When databus is provided and databus["datasets"].id_int is available,
calls ts.insertdatasetpublication to link the validated publication to the
dataset. The publication ID is stored in response.id_int.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor to execute SQL commands.
TYPE:
|
yml_dict
|
Dictionary containing YAML configuration data with publication parameters.
TYPE:
|
csv_file
|
List of row dicts from the CSV file.
TYPE:
|
databus
|
Prior validation results. When not None, uses
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object containing validation messages, validity list,
overall validity status, and the publication ID in |
Examples:
>>> valid_publication(cursor, config_dict, csv_rows)
Response(valid=[True, True], message=[...], validAll=True)
insert_final
insert_final(cur, databus)
Finalizes a dataset submission by inserting a record into ndb.datasetsubmissions.
This function should be called only after all prior validation steps have passed
(validAll is True for every key in databus). It records the submission
in the Neotoma database, linking the dataset, database, and contact together with
a submission date and a fixed submission type of 6.
| PARAMETER | DESCRIPTION |
|---|---|
cur
|
Database cursor for executing SQL queries.
|
databus
|
Accumulated validation results. Must contain:
-
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Response
|
Response object with validity status and a success/failure message. |
Examples:
>>> if all(databus[k].validAll for k in databus):
... result = insert_final(cur, databus=databus)
Response(valid=[True], message=["✔ Dataset submission has been finalized"])
DataBUS Helpers
Utility functions in the neotomaHelpers package used for parameter extraction, file handling,
template parsing, and transaction management.
Parameter Extraction
pull_params
pull_params(params, yml_dict, csv_template, table=None)
Pull and process parameters for database insert statements.
Extracts parameters from YAML template and CSV data, performs type conversions (date, int, float, coordinates, string), handles special cases like notes and chronologies, and returns cleaned data ready for insertion.
| PARAMETER | DESCRIPTION |
|---|---|
params
|
List of strings for columns needed to generate insert statement.
TYPE:
|
yml_dict
|
Dictionary returned by YAML template containing 'metadata' key.
TYPE:
|
csv_template
|
CSV data as list of dictionaries with column data to upload.
TYPE:
|
table
|
Name of the table(s) parameters are drawn for. If list, returns results for each table.
TYPE:
|
name
|
Name field identifier. Defaults to None.
TYPE:
|
values
|
Whether to treat columns as value columns. Defaults to False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
dict or list: Cleaned and formatted parameters ready for database insertion. If table is a list, returns list of dicts. If table is str, returns single dict. Returns hierarchical dicts for special tables like chronologies/sampleages. |
Template & File Utilities
read_csv
read_csv(filename)
Read CSV file and return a structured list of dictionaries.
Parses a CSV file and converts each row into a dictionary with column headers as keys.
Examples:
>>> read_csv('pollen_data.csv')
[{'depth': '2.5', 'quercus': '125'}, {'depth': '5.0', 'quercus': '142'}]
| PARAMETER | DESCRIPTION |
|---|---|
filename
|
Path to the CSV file to read.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
List of dictionaries where each dictionary represents a row, with column headers as keys. |
check_file
check_file(filename, strict=False, validation_files='data/logs/')
Checks validation log file for errors from prior validation runs.
Examines validation log files to determine if a CSV file has been successfully validated. Checks both validated and not_validated directories. Counts errors and warnings, removing log file if strict mode passes.
Examples:
>>> check_file("pollen_data.csv", strict=False)
{'pass': True, 'match': 0, 'message': ['No errors found in the last validation.']}
>>> check_file("chronology.csv", strict=True)
{'pass': False, 'match': 2, 'message': ['Errors found in the prior validation.']}
| PARAMETER | DESCRIPTION |
|---|---|
filename
|
File path or relative path for a template CSV file.
TYPE:
|
strict
|
If True, also count "Valid: FALSE" lines as errors. Defaults to False.
TYPE:
|
validation_files
|
Path to validation logs directory. Defaults to "data/validation_logs/".
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with 'pass' (bool), 'match' (int error count), and 'message' (list). |
hash_file
hash_file(filename, validation_files='data/validation_logs/')
Calculate MD5 hash of a file and compare against validation logs.
Computes the MD5 hash of a file and compares it with previously stored hashes in validation log files to determine if the file has been validated or modified.
Examples:
>>> hash_file('pollen_data.csv')
{'pass': True, 'hash': 'abc123def456...', 'message': ['abc123def456...', 'Hashes match, file hasn't changed.']}
>>> hash_file('chronology_template.xlsx')
{'pass': False, 'hash': 'xyz789abc123...', 'message': ['xyz789abc123...', 'File has changed, validating chronology_template.xlsx.']}
| PARAMETER | DESCRIPTION |
|---|---|
filename
|
Path to the file to hash.
TYPE:
|
validation_files
|
Path to the validation logs directory. Defaults to 'data/validation_logs/'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with keys: 'pass' (bool): True if file matches validation log hash. 'hash' (str): MD5 hash of the file in hexadecimal. 'message' (list): List of status messages about validation result. |
excel_to_yaml
InlineList(data)
Custom class to represent inline lists in YAML output.
Used with custom YAML representer to output lists in flow style (inline) rather than block style.
| ATTRIBUTE | DESCRIPTION |
|---|---|
data |
The list data to be represented inline.
|
Initialize InlineList with data.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
List or sequence to store.
|
represent_inline_list(dumper, data)
YAML representer for InlineList objects.
Tells the YAML dumper to output InlineList objects as inline sequences.
| PARAMETER | DESCRIPTION |
|---|---|
dumper
|
YAML dumper instance.
|
data
|
InlineList object to represent.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
|
YAML node representing the sequence in flow style. |
excel_to_yaml(temp_file, file_name)
Convert Excel template file to YAML format.
Reads data mapping and metadata from Excel sheets, processes column definitions including units and uncertainty information, and writes formatted YAML output.
Examples:
>>> excel_to_yaml('template.xlsx', 'template')
# Creates template.yml file
| PARAMETER | DESCRIPTION |
|---|---|
temp_file
|
Path to the Excel template file (.xls or .xlsx).
TYPE:
|
file_name
|
Base filename for output YAML (without extension). Output file will be named file_name.yml
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
Writes YAML file to disk with name file_name.yml |
Logging
logging_dict
logging_response(response, logfile)
Append Response object string representation to logfile.
Validates that response is a Response object, then appends its string representation to the logfile.
Examples:
>>> logfile = []
>>> logging_response(pollen_response, logfile)
[<string representation of pollen_response>]
>>> logfile = []
>>> logging_response(chronology_response, logfile)
[<string representation of chronology_response>]
| PARAMETER | DESCRIPTION |
|---|---|
response
|
Response object from DataBUS module.
TYPE:
|
logfile
|
List to append the response to.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
The updated logfile list with response appended. |
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If response is not an instance of Response class. |