Skip to content

Table: ndb.samples#

Description#

This table stores sample data. Samples belong to Analysis Units, which belong to Collection Units, which belong to Sites. Samples also belong to a Dataset, and the Dataset determines the type of sample. Thus, there could be two different samples from the same Analysis Unit, one belonging to a pollen dataset, the other to a plant macrofossil dataset.

TODO: Expand this description with: - What data does this table store? - What is the business/research purpose? - How is this data collected or generated? - Are there any important caveats or data quality issues?

Table Structure#

Visual Schema

Schema: ndb | Table Comment: This table stores sample data. Samples belong to Analysis Units, which belong to Collection Units, which belong to Sites. Samples also belong to a Dataset, and the Dataset determines the type of sample. Thus, there could be two different samples from the same Analysis Unit, one belonging to a pollen dataset, the other to a plant macrofossil dataset.

Statistics#

Metric Value
Row Count 623,878
Total Size 252 MB
Table Size 49 MB
Indexes Size 204 MB

Relationships#

Primary Key: sampleid

Foreign Keys:

Referenced By:

TODO: Document which tables reference this table (will be auto-detected in validation).

Data Dictionary#

Column Type Nullable Default Constraints Description
sampleid integer nextval('ndb.seq_samples_sa... PRIMARY KEY An arbitrary Sample identification number.
analysisunitid integer - FOREIGN KEY Analysis Unit identification number. Field links to the AnalysisUnits table.
datasetid integer - FOREIGN KEY Dataset identification number. Field links to the Datasets table.
samplename character varying(80) - - Sample name if any.
analysisdate date - - Date of analysis.
labnumber character varying(40) - - Laboratory number for the sample. A special case regards geochronologic samples, for which the LabNumber is the number, if any, assigned by the submitter, not the number assigned by the radiocarbon laboratory, which is in the Geochronology table.
preparationmethod text - - Description, notes, or comments on preparation methods. For faunal samples, notes on screening methods or screen size are stored here.
notes text - - Free form note or comments about the sample.
recdatecreated timestamp without time zone timezone('UTC'::text, now()) -
recdatemodified timestamp without time zone - -
sampledate date - -
taxonid integer - FOREIGN KEY

TODO: Review column descriptions and add comments where missing.

Usage Examples#

Example 1: Basic Selection#

-- Get recent records from samples
SELECT *
FROM samples
ORDER BY sampleid DESC
LIMIT 10;

Purpose: Retrieve the 10 most recent records from samples

Example 2: Count Records#

-- Count total records
SELECT COUNT(*) as total_records
FROM samples;

Purpose: Get the total number of records in samples

Example 3: Filter by Date Range#

-- Get records within a date range
SELECT *
FROM samples
WHERE analysisdate >= '2024-01-01'
  AND analysisdate < '2025-01-01'
ORDER BY analysisdate DESC;

Purpose: Retrieve records from samples within a specific date range

Example 4: Join with analysisunits#

-- Join with related table
SELECT 
    t1.*,
    t2.*
FROM samples t1
INNER JOIN analysisunits t2 
    ON t1.analysisunitid = t2.analysisunitid
LIMIT 100;

Purpose: Retrieve samples records with related data from analysisunits

Example 5: Aggregate Data#

-- Aggregate records by analysisunitid
SELECT 
    analysisunitid,
    COUNT(*) as count
FROM samples
GROUP BY analysisunitid
ORDER BY count DESC
LIMIT 10;

Purpose: Count records grouped by analysisunitid

TODO: Add more specific examples relevant to common research questions or operational tasks.

Data Quality Notes#

Automated Data Quality Tests#

This table is subject to the following automated quality checks:

❌ ref_001: datasets_referenced_by_samples

  • Severity: ERROR
  • Status: FAILED
  • Description: All datasets should be referenced by at least one sample

  • Suggested Remediation: - Check if samples were never entered for this dataset

  • Verify if dataset should be archived/deleted
  • Contact data owner for clarification

✅ ref_002: samples_have_valid_datasets

  • Severity: ERROR
  • Status: PASSED
  • Description: All samples must reference valid collection sites

✅ comp_004: sample_ages_for_samples

  • Severity: WARNING
  • Status: PASSED
  • Description: Samples should have sample ages, whether from a chronology or collection date.

❌ valid_006: samples_per_analysisunit

  • Severity: WARNING
  • Status: FAILED
  • Description: Although some datasets may have multiple samples per analysis unit per dataset, we should generally expect that most analysis units have only one set of samples.

  • Suggested Remediation: - Check the dataset to see if the samples are legitimately multiple samples within a single dataset.

  • Check with the original publication, or upload data steward.
  • Potentially remove duplicate or empty samples if they exist.

✅ biz_001: modern_samples_have_recent_dates

  • Severity: WARNING
  • Status: PASSED
  • Description: Samples marked as modern should have dates after 1950

See the Data Quality Report for details.

Maintenance#

  • Data Owner: TODO: Assign owner
  • Update Frequency: TODO: Document frequency
  • Last Major Schema Change: TODO: Document when schema last changed

TODO: Link to: - Related API endpoints - Data collection procedures - Analysis notebooks or reports that use this table - External ontologies or standards