Table: ndb.taxa#
Description#
This table lists all taxa in the database. Most taxa are biological taxa; however, some are biometric measures and some are physical parameters.
TODO: Expand this description with: - What data does this table store? - What is the business/research purpose? - How is this data collected or generated? - Are there any important caveats or data quality issues?
Table Structure#
Schema: ndb | Table Comment: This table lists all taxa in the database. Most taxa are biological taxa; however, some are biometric measures and some are physical parameters.
Statistics#
| Metric | Value |
|---|---|
| Row Count | 58,010 |
| Total Size | 43 MB |
| Table Size | 8520 kB |
| Indexes Size | 34 MB |
Relationships#
Primary Key: taxonid
Foreign Keys:
highertaxonid→taxa.taxonidpublicationid→publications.publicationidtaxagroupid→taxagrouptypes.taxagroupidvalidatorid→contacts.contactid
Referenced By:
TODO: Document which tables reference this table (will be auto-detected in validation).
Data Dictionary#
| Column | Type | Nullable | Default | Constraints | Description |
|---|---|---|---|---|---|
taxonid |
integer | ✗ | nextval('ndb.seq_taxa_taxon... |
FOREIGN KEY, PRIMARY KEY | An arbitrary Taxon identification number. |
taxoncode |
character varying(64) | ✗ | - |
- | A code for the Taxon. These codes are useful for other software or output for which the complete name is too long. Because of the very large number of taxa, codes can be duplicated for different Taxa Groups. In general, these various Taxa Groups are analyzed separately, and no duplication will occur within a dataset. However, if Taxa Groups are combined, unique codes can be generated by prefixing with the TaxaGroupID, For example: |
| *VPL:Cle Clethra | |||||
| *MAM:Cle Clethrionomys | |||||
| A set of conventions has been established for codes. In some cases conventions differ depending on whether the organism is covered by rules of botanical nomenclature (BN) or zoological nomenclature (ZN). | |||||
| *Genus – Three-letter code, first letter capitalized, generally the first three unless already used. | |||||
| **Ace Acer | |||||
| **Cle Clethrionomys | |||||
| *Subgenus – The genus code plus a two-letter subgenus code, first letter capitalized, separated by a period. | |||||
| **Pin.Pi Pinus subg. Pinus | |||||
| **Syn.Mi Synaptomys (Mictomys) | |||||
| *Species – The genus code plus a two-letter, lower-case species code, separated by a period. | |||||
| **Ace.sa Acer saccharum | |||||
| **Ace.sc Acer saccharinum | |||||
| **Cle.ga Clethrionomys gapperi | |||||
| *Subspecies or variety – The species code a two-letter, lower-case subspecies code, separated by a period. | |||||
| **Aln.vi.si Alnus viridis subsp. Sinuata | |||||
| **Bis.bi.an Bison bison antiquus | |||||
| *Family – Six-letter code, first letter capitalized, consisting of three letters followed by «eae» (BN) or «dae» (ZN). | |||||
| **Roseae Rosaceae | |||||
| **Bovdae Bovidae | |||||
| *Subfamily or tribe – (BN) Family code plus two-letter subfamily code, first letter capitalized, separated by a priod. (ZN) Six-letter code, first letter capitalized, consisting of three letters followed by «nae». | |||||
| **Asteae.As Asteraceae subf. Asteroideae | |||||
| **Asteae.Cy Asteraceae tribe Cynarea | |||||
| **Arvnae Arvicolinae | |||||
| *Order – (BN) Six-letter code, first letter capitalized, consisting of three letters followed by «les». (ZN) Six-letter code, first letter capitalized, consisting of three letters, followed by the last three letters of the order name, unless the order name is ≤6 letters long, in which case the code = the order name. Zoological orders do not have a common ending. | |||||
| **Ercles Ericales | |||||
| **Artyla Artiodactyla | |||||
| **Rodtia Rodentia | |||||
| *Taxonomic levels higher than order – Six-letter code, first letter capitalized, consisting of three letters, followed by the last three letters of the order name, unless the order name is ≤6 letters long, in which case the code = the order name.. | |||||
| **Magida Magnoliopsida | |||||
| **Magyta Magnoliophyta | |||||
| **Mamlia Mammalia | |||||
| *Types – The conventional taxon code followed by «-type». | |||||
| **Aln.in-t Alnus incana-type | |||||
| **Amb-t Ambrosia-type | |||||
| *cf. – «cf. » is placed in the proper position. | |||||
| **Odc.cf.he Odocoileus cf. O. hemionus | |||||
| **cf.Odc.he cf. Odocoileus hemionus | |||||
| **cf.Odc cf. Odocoileus | |||||
| *aff. – «aff. » is abbreviated to «af. ». | |||||
| **af.Can.di aff. Canis dirus | |||||
| *? – «?» is placed in the proper position. | |||||
| **?Pro.lo ?Procyon lotor | |||||
| *Alternative names – A slash is placed between the conventional abbreviations for the alternative taxa. | |||||
| **Ost/Cpn Ostrya/Carpinus | |||||
| **Mstdae/Mepdae Mustelidae/Mephitidae | |||||
| *Undifferentiated taxa – (BN) «.ud» is added to the code. (ZN) «.sp » is added to the code. | |||||
| **Aln.ud Alnus undiff. | |||||
| **Roseae.ud Rosaceae undiff. | |||||
| **Mms.sp Mammuthus sp. | |||||
| **Taydae.sp Tayassuidae sp | |||||
| Parenthetic modifiers – The conventional taxon code with an appropriate abbreviation for the modifier separated by periods. Multiple modifiers also separated by periods. Abbreviations for pollen morphological modifiers follow Iversen and Troels-Smith (1950). | |||||
| **Raneae.C3 Ranunculaceae (tricolpate) | |||||
| **Raneae.Cperi Ranunculaceae (pericolpate) | |||||
| **Pineae.ves.ud Pinaceae (vesiculate) undiff. | |||||
| **Myteae.Csyn.psi Myrtaceae (syncolpate, psilate) | |||||
| **Bet.>20µ Betula (>20 µm) | |||||
| *Non-biological taxa – Use appropriate abbreviations. | |||||
| **bulk.dens Bulk density | |||||
| **LOI Loss-on-ignition | |||||
| **Bet.pol.diam Betula mean pollen-grain diameter | |||||
taxonname |
text | ✗ | - |
- | Name of the taxon. Most TaxonNames are biological taxa; however, some are biometric measures and some are physical parameters. In addition, some biological taxa may have parenthetic non-Latin modifers, e.g. «Betula (>20 µm)» for Betula pollen grains >20 µm in diameter. In general, the names used in Neotoma are those used by the original investigator. In particular, identifications are not changed, although Dataset notes can be added to the database regarding particular identifications. However, some corrections and synonymizations are made. These include: |
| *Misspellings are corrected. | |||||
| *Nomenclatural, homotypic, or objective synonyms may be applied. Because these synonyms unambiguously refer to the same taxon, no change in identification is implied. For example, the old family name for the grasses «Gramineae» is changed to «Poaceae». | |||||
| *Taxonomic, heterotypic, or subjective synonyms may be applied if the change does not effectively assign the specimen to a different taxon. Although two names may have been based on different type specimens, if further research has shown that these are in fact the same taxon, the name is changed to the accepted name. These synonymizations should not cause confusion. However, uncritical synonymization, although taxonomically correct, can result in loss of information, and should be avoided. For example, although a number of recent studies have shown that the Taxodiaceae should be merged with the Cupressaceae, simply synonymizing Taxodiaceae with Cupressaceae may expand the universe of taxa beyond that implied by the original investigator. For example, a palynologist in the southeastern United States may have used «Taxodiaceae» to imply «Taxodium», which is the only genus of the family that has occurred in the region since the Pliocene, but used the the family name because, palynologically, Taxodiuim cannot be differentiated from other Taxodiaceae. However, well preserved Taxodium pollen grains can be differentiated from the other Cupressaceous genera in the region, Juniperus and Chamaecyperus. Thus, the appropriate synonymization for «Taxodiaceae» in this region would be «Taxodium» or «Taxodium-type», which would retain the original taxonomic precision. On the other hand, the old «TCT» shorthand for «Taxodiaceae/Cupressaceae/Taxaceae» now becomes «Cupressaceae/Taxaceae» with no loss of information. | |||||
| *For alternative taxonomic designations, the order may be changed. For example, «Ostrya/Carpinus» would be substituted for «Carpinus/Ostrya». | |||||
author |
character varying(128) | ✓ | - |
- | Author(s) of the name. Neither the pollen database nor FAUNMAP stored author names, so these do not currently exist in Neotoma for plant and mammal names. These databases follow standard taxonomic references (e.g. Flora of North America, Flora Europaea, Wilson and Reeder's Mammal Species of the World), which, of course, do cite the original authors. However, for beetles, the standard practice is to cite original author names; therefore, this field was added to Neotoma. |
valid |
boolean | ✓ | - |
- | |
highertaxonid |
integer | ✓ | - |
- | The TaxonID of the next higher taxonomic rank, for example, the HigherTaxonID for «Bison» is the TaxonID for «Bovidae». For «cf.'s» and «-types», the next higher rank may be much higher owing to the uncertainty of the identification; the HigherTaxonID for «cf. Bison bison» is the TaxonId for «Mammalia». The HigherTaxonID implements the taxonomic hierarchy in Neotoma. |
extinct |
boolean | ✗ | - |
- | True if the taxon is extinct, False if extant. |
taxagroupid |
character varying(3) | ✗ | - |
FOREIGN KEY | The TaxaGroupID facilitates rapid extraction of taxa groups that are typically grouped together for analysis. Some of these groups contain taxa in different classes or phyla. For example, vascular plants include the Spermatophyta and Pteridophyta; the herps include Reptilia and Amphibia; the testate amoebae include taxa from different phyla. Field links to the TaxaGroupTypes table. |
publicationid |
integer | ✓ | - |
FOREIGN KEY | Publication identification number. Field links to the Publications table. |
validatorid |
integer | ✓ | - |
- | |
validatedate |
date | ✓ | - |
- | |
notes |
text | ✓ | - |
- | Free form notes or comments about the Taxon. |
recdatecreated |
timestamp without time zone | ✓ | timezone('UTC'::text, now()) |
- | |
recdatemodified |
timestamp without time zone | ✓ | - |
- |
TODO: Review column descriptions and add comments where missing.
Usage Examples#
Example 1: Basic Selection#
Purpose: Retrieve the 10 most recent records from taxa
Example 2: Count Records#
Purpose: Get the total number of records in taxa
Example 3: Filter by Date Range#
-- Get records within a date range
SELECT *
FROM taxa
WHERE validatedate >= '2024-01-01'
AND validatedate < '2025-01-01'
ORDER BY validatedate DESC;
Purpose: Retrieve records from taxa within a specific date range
Example 4: Join with taxa#
-- Join with related table
SELECT
t1.*,
t2.*
FROM taxa t1
INNER JOIN taxa t2
ON t1.highertaxonid = t2.taxonid
LIMIT 100;
Purpose: Retrieve taxa records with related data from taxa
Example 5: Aggregate Data#
-- Aggregate records by taxoncode
SELECT
taxoncode,
COUNT(*) as count
FROM taxa
GROUP BY taxoncode
ORDER BY count DESC
LIMIT 10;
Purpose: Count records grouped by taxoncode
TODO: Add more specific examples relevant to common research questions or operational tasks.
Data Quality Notes#
Automated Data Quality Tests#
This table is subject to the following automated quality checks:
✅ ref_004: valid_taxa_need_highertaxonids
- Severity: WARNING
- Status: PASSED
- Description: Taxa that have been added to the database should have a valid higher taxonomic identifier, unless those taxa are no longer considered valid taxonomically.
❌ comp_003: taxa_have_been_added_by_stewards
- Severity: WARNING
- Status: FAILED
-
Description: When a taxon is submitted to Neotoma there should be a person associated with that submission
-
Suggested Remediation: - Confirm placement with stewards, identify those stewards as the validators.
❌ valid_003: valid_terminal_taxa_have_values
- Severity: WARNING
- Status: FAILED
-
Description: Taxa that are identified as 'leaves' in the database should be associated with values in the database.
-
Suggested Remediation: - Check with data stewards for the particular data type.
- Ensure that the taxa are valid.
❌ bix_002: taxonnames_are_not_duplicated_within_groups
- Severity: ERROR
- Status: FAILED
-
Description: Although different ecological groups may have similar taxon names (e.g., Abronia in reptiles, plants, protists and fungi), within groups the taxonomic name should be unique.
-
Suggested Remediation: ['Identify the
correctentry, remove duplicate entries.']
See the Data Quality Report for details.
Maintenance#
- Data Owner: TODO: Assign owner
- Update Frequency: TODO: Document frequency
- Last Major Schema Change: TODO: Document when schema last changed
API Endpoints#
This table is accessed through the following API endpoints:
| Method | Endpoint | Description |
|---|---|---|
GET |
/v1.5/apps/TaxaInDatasets |
Lists all Neotoma taxa (alphebetically) and the dataset types in which the taxa appear. |
See the API documentation for details.
Related Documentation#
TODO: Link to:
- Related API endpoints
- Data collection procedures
- Analysis notebooks or reports that use this table
- External ontologies or standards