Skip to content

Table: ndb.taxa#

Description#

This table lists all taxa in the database. Most taxa are biological taxa; however, some are biometric measures and some are physical parameters.

TODO: Expand this description with: - What data does this table store? - What is the business/research purpose? - How is this data collected or generated? - Are there any important caveats or data quality issues?

Table Structure#

Visual Schema

Schema: ndb | Table Comment: This table lists all taxa in the database. Most taxa are biological taxa; however, some are biometric measures and some are physical parameters.

Statistics#

Metric Value
Row Count 58,010
Total Size 43 MB
Table Size 8520 kB
Indexes Size 34 MB

Relationships#

Primary Key: taxonid

Foreign Keys:

Referenced By:

TODO: Document which tables reference this table (will be auto-detected in validation).

Data Dictionary#

Column Type Nullable Default Constraints Description
taxonid integer nextval('ndb.seq_taxa_taxon... FOREIGN KEY, PRIMARY KEY An arbitrary Taxon identification number.
taxoncode character varying(64) - - A code for the Taxon. These codes are useful for other software or output for which the complete name is too long. Because of the very large number of taxa, codes can be duplicated for different Taxa Groups. In general, these various Taxa Groups are analyzed separately, and no duplication will occur within a dataset. However, if Taxa Groups are combined, unique codes can be generated by prefixing with the TaxaGroupID, For example:
*VPL:Cle Clethra
*MAM:Cle Clethrionomys
A set of conventions has been established for codes. In some cases conventions differ depending on whether the organism is covered by rules of botanical nomenclature (BN) or zoological nomenclature (ZN).
*Genus – Three-letter code, first letter capitalized, generally the first three unless already used.
**Ace Acer
**Cle Clethrionomys
*Subgenus – The genus code plus a two-letter subgenus code, first letter capitalized, separated by a period.
**Pin.Pi Pinus subg. Pinus
**Syn.Mi Synaptomys (Mictomys)
*Species – The genus code plus a two-letter, lower-case species code, separated by a period.
**Ace.sa Acer saccharum
**Ace.sc Acer saccharinum
**Cle.ga Clethrionomys gapperi
*Subspecies or variety – The species code a two-letter, lower-case subspecies code, separated by a period.
**Aln.vi.si Alnus viridis subsp. Sinuata
**Bis.bi.an Bison bison antiquus
*Family – Six-letter code, first letter capitalized, consisting of three letters followed by «eae» (BN) or «dae» (ZN).
**Roseae Rosaceae
**Bovdae Bovidae
*Subfamily or tribe – (BN) Family code plus two-letter subfamily code, first letter capitalized, separated by a priod. (ZN) Six-letter code, first letter capitalized, consisting of three letters followed by «nae».
**Asteae.As Asteraceae subf. Asteroideae
**Asteae.Cy Asteraceae tribe Cynarea
**Arvnae Arvicolinae
*Order – (BN) Six-letter code, first letter capitalized, consisting of three letters followed by «les». (ZN) Six-letter code, first letter capitalized, consisting of three letters, followed by the last three letters of the order name, unless the order name is ≤6 letters long, in which case the code = the order name. Zoological orders do not have a common ending.
**Ercles Ericales
**Artyla Artiodactyla
**Rodtia Rodentia
*Taxonomic levels higher than order – Six-letter code, first letter capitalized, consisting of three letters, followed by the last three letters of the order name, unless the order name is ≤6 letters long, in which case the code = the order name..
**Magida Magnoliopsida
**Magyta Magnoliophyta
**Mamlia Mammalia
*Types – The conventional taxon code followed by «-type».
**Aln.in-t Alnus incana-type
**Amb-t Ambrosia-type
*cf. – «cf. » is placed in the proper position.
**Odc.cf.he Odocoileus cf. O. hemionus
**cf.Odc.he cf. Odocoileus hemionus
**cf.Odc cf. Odocoileus
*aff. – «aff. » is abbreviated to «af. ».
**af.Can.di aff. Canis dirus
*? – «?» is placed in the proper position.
**?Pro.lo ?Procyon lotor
*Alternative names – A slash is placed between the conventional abbreviations for the alternative taxa.
**Ost/Cpn Ostrya/Carpinus
**Mstdae/Mepdae Mustelidae/Mephitidae
*Undifferentiated taxa – (BN) «.ud» is added to the code. (ZN) «.sp » is added to the code.
**Aln.ud Alnus undiff.
**Roseae.ud Rosaceae undiff.
**Mms.sp Mammuthus sp.
**Taydae.sp Tayassuidae sp
Parenthetic modifiers – The conventional taxon code with an appropriate abbreviation for the modifier separated by periods. Multiple modifiers also separated by periods. Abbreviations for pollen morphological modifiers follow Iversen and Troels-Smith (1950).
**Raneae.C3 Ranunculaceae (tricolpate)
**Raneae.Cperi Ranunculaceae (pericolpate)
**Pineae.ves.ud Pinaceae (vesiculate) undiff.
**Myteae.Csyn.psi Myrtaceae (syncolpate, psilate)
**Bet.>20µ Betula (>20 µm)
*Non-biological taxa – Use appropriate abbreviations.
**bulk.dens Bulk density
**LOI Loss-on-ignition
**Bet.pol.diam Betula mean pollen-grain diameter
taxonname text - - Name of the taxon. Most TaxonNames are biological taxa; however, some are biometric measures and some are physical parameters. In addition, some biological taxa may have parenthetic non-Latin modifers, e.g. «Betula (>20 µm)» for Betula pollen grains >20 µm in diameter. In general, the names used in Neotoma are those used by the original investigator. In particular, identifications are not changed, although Dataset notes can be added to the database regarding particular identifications. However, some corrections and synonymizations are made. These include:
*Misspellings are corrected.
*Nomenclatural, homotypic, or objective synonyms may be applied. Because these synonyms unambiguously refer to the same taxon, no change in identification is implied. For example, the old family name for the grasses «Gramineae» is changed to «Poaceae».
*Taxonomic, heterotypic, or subjective synonyms may be applied if the change does not effectively assign the specimen to a different taxon. Although two names may have been based on different type specimens, if further research has shown that these are in fact the same taxon, the name is changed to the accepted name. These synonymizations should not cause confusion. However, uncritical synonymization, although taxonomically correct, can result in loss of information, and should be avoided. For example, although a number of recent studies have shown that the Taxodiaceae should be merged with the Cupressaceae, simply synonymizing Taxodiaceae with Cupressaceae may expand the universe of taxa beyond that implied by the original investigator. For example, a palynologist in the southeastern United States may have used «Taxodiaceae» to imply «Taxodium», which is the only genus of the family that has occurred in the region since the Pliocene, but used the the family name because, palynologically, Taxodiuim cannot be differentiated from other Taxodiaceae. However, well preserved Taxodium pollen grains can be differentiated from the other Cupressaceous genera in the region, Juniperus and Chamaecyperus. Thus, the appropriate synonymization for «Taxodiaceae» in this region would be «Taxodium» or «Taxodium-type», which would retain the original taxonomic precision. On the other hand, the old «TCT» shorthand for «Taxodiaceae/Cupressaceae/Taxaceae» now becomes «Cupressaceae/Taxaceae» with no loss of information.
*For alternative taxonomic designations, the order may be changed. For example, «Ostrya/Carpinus» would be substituted for «Carpinus/Ostrya».
author character varying(128) - - Author(s) of the name. Neither the pollen database nor FAUNMAP stored author names, so these do not currently exist in Neotoma for plant and mammal names. These databases follow standard taxonomic references (e.g. Flora of North America, Flora Europaea, Wilson and Reeder's Mammal Species of the World), which, of course, do cite the original authors. However, for beetles, the standard practice is to cite original author names; therefore, this field was added to Neotoma.
valid boolean - -
highertaxonid integer - - The TaxonID of the next higher taxonomic rank, for example, the HigherTaxonID for «Bison» is the TaxonID for «Bovidae». For «cf.'s» and «-types», the next higher rank may be much higher owing to the uncertainty of the identification; the HigherTaxonID for «cf. Bison bison» is the TaxonId for «Mammalia». The HigherTaxonID implements the taxonomic hierarchy in Neotoma.
extinct boolean - - True if the taxon is extinct, False if extant.
taxagroupid character varying(3) - FOREIGN KEY The TaxaGroupID facilitates rapid extraction of taxa groups that are typically grouped together for analysis. Some of these groups contain taxa in different classes or phyla. For example, vascular plants include the Spermatophyta and Pteridophyta; the herps include Reptilia and Amphibia; the testate amoebae include taxa from different phyla. Field links to the TaxaGroupTypes table.
publicationid integer - FOREIGN KEY Publication identification number. Field links to the Publications table.
validatorid integer - -
validatedate date - -
notes text - - Free form notes or comments about the Taxon.
recdatecreated timestamp without time zone timezone('UTC'::text, now()) -
recdatemodified timestamp without time zone - -

TODO: Review column descriptions and add comments where missing.

Usage Examples#

Example 1: Basic Selection#

-- Get recent records from taxa
SELECT *
FROM taxa
ORDER BY taxonid DESC
LIMIT 10;

Purpose: Retrieve the 10 most recent records from taxa

Example 2: Count Records#

-- Count total records
SELECT COUNT(*) as total_records
FROM taxa;

Purpose: Get the total number of records in taxa

Example 3: Filter by Date Range#

-- Get records within a date range
SELECT *
FROM taxa
WHERE validatedate >= '2024-01-01'
  AND validatedate < '2025-01-01'
ORDER BY validatedate DESC;

Purpose: Retrieve records from taxa within a specific date range

Example 4: Join with taxa#

-- Join with related table
SELECT 
    t1.*,
    t2.*
FROM taxa t1
INNER JOIN taxa t2 
    ON t1.highertaxonid = t2.taxonid
LIMIT 100;

Purpose: Retrieve taxa records with related data from taxa

Example 5: Aggregate Data#

-- Aggregate records by taxoncode
SELECT 
    taxoncode,
    COUNT(*) as count
FROM taxa
GROUP BY taxoncode
ORDER BY count DESC
LIMIT 10;

Purpose: Count records grouped by taxoncode

TODO: Add more specific examples relevant to common research questions or operational tasks.

Data Quality Notes#

Automated Data Quality Tests#

This table is subject to the following automated quality checks:

✅ ref_004: valid_taxa_need_highertaxonids

  • Severity: WARNING
  • Status: PASSED
  • Description: Taxa that have been added to the database should have a valid higher taxonomic identifier, unless those taxa are no longer considered valid taxonomically.

❌ comp_003: taxa_have_been_added_by_stewards

  • Severity: WARNING
  • Status: FAILED
  • Description: When a taxon is submitted to Neotoma there should be a person associated with that submission

  • Suggested Remediation: - Confirm placement with stewards, identify those stewards as the validators.

❌ valid_003: valid_terminal_taxa_have_values

  • Severity: WARNING
  • Status: FAILED
  • Description: Taxa that are identified as 'leaves' in the database should be associated with values in the database.

  • Suggested Remediation: - Check with data stewards for the particular data type.

  • Ensure that the taxa are valid.

❌ bix_002: taxonnames_are_not_duplicated_within_groups

  • Severity: ERROR
  • Status: FAILED
  • Description: Although different ecological groups may have similar taxon names (e.g., Abronia in reptiles, plants, protists and fungi), within groups the taxonomic name should be unique.

  • Suggested Remediation: ['Identify the correct entry, remove duplicate entries.']

See the Data Quality Report for details.

Maintenance#

  • Data Owner: TODO: Assign owner
  • Update Frequency: TODO: Document frequency
  • Last Major Schema Change: TODO: Document when schema last changed

API Endpoints#

This table is accessed through the following API endpoints:

Method Endpoint Description
GET /v1.5/apps/TaxaInDatasets Lists all Neotoma taxa (alphebetically) and the dataset types in which the taxa appear.

See the API documentation for details.

TODO: Link to:

  • Related API endpoints
  • Data collection procedures
  • Analysis notebooks or reports that use this table
  • External ontologies or standards