Table: ndb.samples#
Description#
This table stores sample data. Samples belong to Analysis Units, which belong to Collection Units, which belong to Sites. Samples also belong to a Dataset, and the Dataset determines the type of sample. Thus, there could be two different samples from the same Analysis Unit, one belonging to a pollen dataset, the other to a plant macrofossil dataset.
TODO: Expand this description with: - What data does this table store? - What is the business/research purpose? - How is this data collected or generated? - Are there any important caveats or data quality issues?
Table Structure#
Schema: ndb | Table Comment: This table stores sample data. Samples belong to Analysis Units, which belong to Collection Units, which belong to Sites. Samples also belong to a Dataset, and the Dataset determines the type of sample. Thus, there could be two different samples from the same Analysis Unit, one belonging to a pollen dataset, the other to a plant macrofossil dataset.
Statistics#
| Metric | Value |
|---|---|
| Row Count | 623,878 |
| Total Size | 252 MB |
| Table Size | 49 MB |
| Indexes Size | 204 MB |
Relationships#
Primary Key: sampleid
Foreign Keys:
analysisunitid→analysisunits.analysisunitiddatasetid→datasets.datasetidtaxonid→taxa.taxonid
Referenced By:
TODO: Document which tables reference this table (will be auto-detected in validation).
Data Dictionary#
| Column | Type | Nullable | Default | Constraints | Description |
|---|---|---|---|---|---|
sampleid |
integer | ✗ | nextval('ndb.seq_samples_sa... |
PRIMARY KEY | An arbitrary Sample identification number. |
analysisunitid |
integer | ✗ | - |
FOREIGN KEY | Analysis Unit identification number. Field links to the AnalysisUnits table. |
datasetid |
integer | ✗ | - |
FOREIGN KEY | Dataset identification number. Field links to the Datasets table. |
samplename |
character varying(80) | ✓ | - |
- | Sample name if any. |
analysisdate |
date | ✓ | - |
- | Date of analysis. |
labnumber |
character varying(40) | ✓ | - |
- | Laboratory number for the sample. A special case regards geochronologic samples, for which the LabNumber is the number, if any, assigned by the submitter, not the number assigned by the radiocarbon laboratory, which is in the Geochronology table. |
preparationmethod |
text | ✓ | - |
- | Description, notes, or comments on preparation methods. For faunal samples, notes on screening methods or screen size are stored here. |
notes |
text | ✓ | - |
- | Free form note or comments about the sample. |
recdatecreated |
timestamp without time zone | ✗ | timezone('UTC'::text, now()) |
- | |
recdatemodified |
timestamp without time zone | ✗ | - |
- | |
sampledate |
date | ✓ | - |
- | |
taxonid |
integer | ✓ | - |
FOREIGN KEY |
TODO: Review column descriptions and add comments where missing.
Usage Examples#
Example 1: Basic Selection#
Purpose: Retrieve the 10 most recent records from samples
Example 2: Count Records#
Purpose: Get the total number of records in samples
Example 3: Filter by Date Range#
-- Get records within a date range
SELECT *
FROM samples
WHERE analysisdate >= '2024-01-01'
AND analysisdate < '2025-01-01'
ORDER BY analysisdate DESC;
Purpose: Retrieve records from samples within a specific date range
Example 4: Join with analysisunits#
-- Join with related table
SELECT
t1.*,
t2.*
FROM samples t1
INNER JOIN analysisunits t2
ON t1.analysisunitid = t2.analysisunitid
LIMIT 100;
Purpose: Retrieve samples records with related data from analysisunits
Example 5: Aggregate Data#
-- Aggregate records by analysisunitid
SELECT
analysisunitid,
COUNT(*) as count
FROM samples
GROUP BY analysisunitid
ORDER BY count DESC
LIMIT 10;
Purpose: Count records grouped by analysisunitid
TODO: Add more specific examples relevant to common research questions or operational tasks.
Data Quality Notes#
Automated Data Quality Tests#
This table is subject to the following automated quality checks:
❌ ref_001: datasets_referenced_by_samples
- Severity: ERROR
- Status: FAILED
-
Description: All datasets should be referenced by at least one sample
-
Suggested Remediation: - Check if samples were never entered for this dataset
- Verify if dataset should be archived/deleted
- Contact data owner for clarification
✅ ref_002: samples_have_valid_datasets
- Severity: ERROR
- Status: PASSED
- Description: All samples must reference valid collection sites
✅ comp_004: sample_ages_for_samples
- Severity: WARNING
- Status: PASSED
- Description: Samples should have sample ages, whether from a chronology or collection date.
❌ valid_006: samples_per_analysisunit
- Severity: WARNING
- Status: FAILED
-
Description: Although some datasets may have multiple samples per analysis unit per dataset, we should generally expect that most analysis units have only one set of samples.
-
Suggested Remediation: - Check the dataset to see if the samples are legitimately multiple samples within a single dataset.
- Check with the original publication, or upload data steward.
- Potentially remove duplicate or empty samples if they exist.
✅ biz_001: modern_samples_have_recent_dates
- Severity: WARNING
- Status: PASSED
- Description: Samples marked as modern should have dates after 1950
See the Data Quality Report for details.
Maintenance#
- Data Owner: TODO: Assign owner
- Update Frequency: TODO: Document frequency
- Last Major Schema Change: TODO: Document when schema last changed
Related Documentation#
TODO: Link to: - Related API endpoints - Data collection procedures - Analysis notebooks or reports that use this table - External ontologies or standards