Table: `ndb.samples`#

Description#

This table stores sample data. Samples belong to Analysis Units, which belong to Collection Units, which belong to Sites. Samples also belong to a Dataset, and the Dataset determines the type of sample. Thus, there could be two different samples from the same Analysis Unit, one belonging to a pollen dataset, the other to a plant macrofossil dataset.

TODO: Expand this description with: - What data does this table store? - What is the business/research purpose? - How is this data collected or generated? - Are there any important caveats or data quality issues?

Table Structure#

Schema: ndb | Table Comment: This table stores sample data. Samples belong to Analysis Units, which belong to Collection Units, which belong to Sites. Samples also belong to a Dataset, and the Dataset determines the type of sample. Thus, there could be two different samples from the same Analysis Unit, one belonging to a pollen dataset, the other to a plant macrofossil dataset.

Statistics#

Metric	Value
Row Count	623,878
Total Size	252 MB
Table Size	49 MB
Indexes Size	204 MB

Relationships#

Primary Key: sampleid

Foreign Keys:

analysisunitid → analysisunits.analysisunitid
datasetid → datasets.datasetid
taxonid → taxa.taxonid

Referenced By:

TODO: Document which tables reference this table (will be auto-detected in validation).

Data Dictionary#

Column	Type	Nullable	Default	Constraints	Description
`sampleid`	integer	✗	`nextval('ndb.seq_samples_sa...`	PRIMARY KEY	An arbitrary Sample identification number.
`analysisunitid`	integer	✗	`-`	FOREIGN KEY	Analysis Unit identification number. Field links to the AnalysisUnits table.
`datasetid`	integer	✗	`-`	FOREIGN KEY	Dataset identification number. Field links to the Datasets table.
`samplename`	character varying(80)	✓	`-`	-	Sample name if any.
`analysisdate`	date	✓	`-`	-	Date of analysis.
`labnumber`	character varying(40)	✓	`-`	-	Laboratory number for the sample. A special case regards geochronologic samples, for which the LabNumber is the number, if any, assigned by the submitter, not the number assigned by the radiocarbon laboratory, which is in the Geochronology table.
`preparationmethod`	text	✓	`-`	-	Description, notes, or comments on preparation methods. For faunal samples, notes on screening methods or screen size are stored here.
`notes`	text	✓	`-`	-	Free form note or comments about the sample.
`recdatecreated`	timestamp without time zone	✗	`timezone('UTC'::text, now())`	-
`recdatemodified`	timestamp without time zone	✗	`-`	-
`sampledate`	date	✓	`-`	-
`taxonid`	integer	✓	`-`	FOREIGN KEY

TODO: Review column descriptions and add comments where missing.

Usage Examples#

Example 1: Basic Selection#

-- Get recent records from samples
SELECT *
FROM samples
ORDER BY sampleid DESC
LIMIT 10;

Purpose: Retrieve the 10 most recent records from samples

Example 2: Count Records#

-- Count total records
SELECT COUNT(*) as total_records
FROM samples;

Purpose: Get the total number of records in samples

Example 3: Filter by Date Range#

-- Get records within a date range
SELECT *
FROM samples
WHERE analysisdate >= '2024-01-01'
  AND analysisdate < '2025-01-01'
ORDER BY analysisdate DESC;

Purpose: Retrieve records from samples within a specific date range

Example 4: Join with analysisunits#

-- Join with related table
SELECT 
    t1.*,
    t2.*
FROM samples t1
INNER JOIN analysisunits t2 
    ON t1.analysisunitid = t2.analysisunitid
LIMIT 100;

Purpose: Retrieve samples records with related data from analysisunits

Example 5: Aggregate Data#

-- Aggregate records by analysisunitid
SELECT 
    analysisunitid,
    COUNT(*) as count
FROM samples
GROUP BY analysisunitid
ORDER BY count DESC
LIMIT 10;

Purpose: Count records grouped by analysisunitid

TODO: Add more specific examples relevant to common research questions or operational tasks.

Data Quality Notes#

Automated Data Quality Tests#

This table is subject to the following automated quality checks:

❌ ref_001: datasets_referenced_by_samples

Severity: ERROR
Status: FAILED
Description: All datasets should be referenced by at least one sample
Suggested Remediation: - Check if samples were never entered for this dataset
Verify if dataset should be archived/deleted
Contact data owner for clarification

✅ ref_002: samples_have_valid_datasets

Severity: ERROR
Status: PASSED
Description: All samples must reference valid collection sites

✅ comp_004: sample_ages_for_samples

Severity: WARNING
Status: PASSED
Description: Samples should have sample ages, whether from a chronology or collection date.

❌ valid_006: samples_per_analysisunit

Severity: WARNING
Status: FAILED
Description: Although some datasets may have multiple samples per analysis unit per dataset, we should generally expect that most analysis units have only one set of samples.
Suggested Remediation: - Check the dataset to see if the samples are legitimately multiple samples within a single dataset.
Check with the original publication, or upload data steward.
Potentially remove duplicate or empty samples if they exist.

✅ biz_001: modern_samples_have_recent_dates

Severity: WARNING
Status: PASSED
Description: Samples marked as modern should have dates after 1950

See the Data Quality Report for details.

Maintenance#

Data Owner: TODO: Assign owner
Update Frequency: TODO: Document frequency
Last Major Schema Change: TODO: Document when schema last changed

Table: ndb.samples#