title: Aggregatedatasets description: Aggregate Datasets are aggregates of samples of a particular data type. Some examples:
* Plant mac status: draft last_validated: 2025-11-12

Table: `ndb.aggregatedatasets`#

Description#

Aggregate Datasets are aggregates of samples of a particular data type. Some examples:
* Plant macrofossil samples from a group of packrat middens collected from a particular valley, mountain range, or other similarly defined geographic area. Each midden is from a different Site or Collection Unit, but they are grouped into time series for that area and are published as a single dataset. * Samples collected from 32 cutbanks along several km of Roberts Creek, northeast Iowa. Each sample is from a different site, but they form a time series from 0-12,510 14C yr BP, and pollen, plant macrofossils, and beetles were published and graphed as if from a single site. * A set of pollen surface samples from a particular region or study that were published and analyzed as a single dataset and submitted to the database as a single dataset. The examples above are datasets predefined in the database. New aggregate datasets could be assembled for particular studies, for example all the pollen samples for a given time slice for a given geographic region.

TODO: Expand this description with: - What data does this table store? - What is the business/research purpose? - How is this data collected or generated? - Are there any important caveats or data quality issues?

Table Structure#

Schema: ndb | Table Comment: Aggregate Datasets are aggregates of samples of a particular data type. Some examples:
* Plant macrofossil samples from a group of packrat middens collected from a particular valley, mountain range, or other similarly defined geographic area. Each midden is from a different Site or Collection Unit, but they are grouped into time series for that area and are published as a single dataset. * Samples collected from 32 cutbanks along several km of Roberts Creek, northeast Iowa. Each sample is from a different site, but they form a time series from 0-12,510 14C yr BP, and pollen, plant macrofossils, and beetles were published and graphed as if from a single site. * A set of pollen surface samples from a particular region or study that were published and analyzed as a single dataset and submitted to the database as a single dataset. The examples above are datasets predefined in the database. New aggregate datasets could be assembled for particular studies, for example all the pollen samples for a given time slice for a given geographic region.

Statistics#

Metric	Value
Row Count	12
Total Size	32 kB
Table Size	8192 bytes
Indexes Size	16 kB

Relationships#

Primary Key: aggregatedatasetid

Foreign Keys:

aggregateordertypeid → aggregateordertypes.aggregateordertypeid

Referenced By:

TODO: Document which tables reference this table (will be auto-detected in validation).

Data Dictionary#

Column	Type	Nullable	Default	Constraints	Description
`aggregatedatasetid`	integer	✗	`nextval('ndb.seq_aggregated...`	PRIMARY KEY	An arbitrary Aggregate Dataset identification number.
`aggregatedatasetname`	character varying(255)	✗	`-`	-	Name of Aggregate Dataset.
`aggregateordertypeid`	integer	✗	`-`	FOREIGN KEY	Aggregate Order Type identification number. Field links to the AggregateOrderTypes lookup table.
`notes`	text	✓	`-`	-	Free form notes about the Aggregate Order Type.
`recdatecreated`	timestamp without time zone	✗	`timezone('UTC'::text, now())`	-
`recdatemodified`	timestamp without time zone	✗	`-`	-

TODO: Review column descriptions and add comments where missing.

Usage Examples#

Example 1: Basic Selection#

-- Get recent records from aggregatedatasets
SELECT *
FROM aggregatedatasets
ORDER BY aggregatedatasetid DESC
LIMIT 10;

Purpose: Retrieve the 10 most recent records from aggregatedatasets

Example 2: Count Records#

-- Count total records
SELECT COUNT(*) as total_records
FROM aggregatedatasets;

Purpose: Get the total number of records in aggregatedatasets

Example 3: Filter by Date Range#

-- Get records within a date range
SELECT *
FROM aggregatedatasets
WHERE recdatecreated >= '2024-01-01'
  AND recdatecreated < '2025-01-01'
ORDER BY recdatecreated DESC;

Purpose: Retrieve records from aggregatedatasets within a specific date range

Example 4: Join with aggregateordertypes#

-- Join with related table
SELECT 
    t1.*,
    t2.*
FROM aggregatedatasets t1
INNER JOIN aggregateordertypes t2 
    ON t1.aggregateordertypeid = t2.aggregateordertypeid
LIMIT 100;

Purpose: Retrieve aggregatedatasets records with related data from aggregateordertypes

Example 5: Aggregate Data#

-- Aggregate records by aggregatedatasetname
SELECT 
    aggregatedatasetname,
    COUNT(*) as count
FROM aggregatedatasets
GROUP BY aggregatedatasetname
ORDER BY count DESC
LIMIT 10;

Purpose: Count records grouped by aggregatedatasetname

TODO: Add more specific examples relevant to common research questions or operational tasks.

Data Quality Notes#

TODO: Document: - Known data quality issues - Validation rules - Expected data ranges - Update frequency and mechanisms - Any ETL processes that populate this table

Maintenance#

Data Owner: TODO: Assign owner
Update Frequency: TODO: Document frequency
Last Major Schema Change: TODO: Document when schema last changed

Table: ndb.aggregatedatasets#