3 SQL Quickly

SQL (Sturctured Query Language) is a standard language for querying and modifying relational databases. There is an official standard for SQL, which means that statements such as SELECT and WHERE are common across most SQL implementations. While many properties are common, individual vendors do maintain proprietary formatting. Neotoma is implemented using PostgreSQL, which has extensive documentation online. It is beyond the scope of this document to teach SQL, however, we have included a large number of SQL queries along with the documentation to provide templates for queries, and to help the user understand how to directly work with the database. These queries can by typed or copied and pasted into database tools such as pgAdmin.

For the purposes of database development, the Neotoma team uses the following SQL tools:

Throughout the Neotoma Database Manual we will refer to tables using the format schema.tablename. The ndb namespace is the schema for most of the data tables within the Neotoma database, so you will see things like ndb.taxa and ndb.sites frequently. There are other schema, including apps, doi and public, but the data tables themselves are kept in the ndb schema. You can see all available data tables in the ndb schema using the Neotoma database schema website.

3.1 SQL Example

The following SQL example lists the number of sites by the geopolitical unit, where the geopolitical unit is a country. It uses three tables, ndb.sites, which contains site information, ndb.geopoliticalunits, which lists all geopolitical units (countries, provinces, towns, etc.), and a JOIN table, which helps to link the ndb.sites table to the ndb.geopoliticalunits table.

By convention, SQL first defines the output, using the SELECT statement. The FROM block is used to explain how the information is brought together, and the WHERE (and subsequent blocks) are meant to help filter and aggregate data as generated within the FROM block.

Throughput this document we will use AS statemements to help us write our SQL queries neatly, and we will use INNER JOIN calls with ON statements, rather than NATURAL INNER JOIN calls. Using AS helps us create an alias so that we don’t have to write so much in our query. As much as possible we will use the same aliases for tables throughout this guide. For example, FROM ndb.sites AS st is commonly used, where st becomes the alias for the ndb.sites table, and stands in for it elsewhere in the query. Often you will see gpu used for ndb.geopoliticalunits, or tx for ndb.taxa. In Postgres SQL it is not neccessary to use the term AS (we use it here to make the queries clearer for folks just learning how to use SQL), so it’s possible to see things like ndb.sites st in place of ndb.sites AS st.

The NATURAL INNER JOIN assumes that any columns with common names are used in a JOIN. Because Neotoma enforces naming consistency (in general) between primary and foreign keys, we ought to be able to perform NATURAL joins. However, a trigger on most Neotoma tables adds the columns recdatecreated and recdatemodified to almost all tables. For this reason we make it practice within SQL queries to be explicit about the joining columns.

The following example show the use of queries with and without AS:

3.1.1 SQL Query

SELECT
  COUNT(*) AS sites,
  gpu.geopoliticalname, 
  gpu.geopoliticalunit
FROM
  ndb.geopoliticalunits AS gpu
  INNER JOIN ndb.sitegeopolitical AS sgp ON sgp.geopoliticalid = gpu.geopoliticalid
  INNER JOIN ndb.sites st ON st.siteid = sgp.siteid
WHERE 
  gpu.geopoliticalunit = 'country'
GROUP BY
  gpu.geopoliticalid
HAVING COUNT(*) > 0
ORDER BY sites DESC;
DT::datatable(geopoliticalsort)

3.2 Table Keys

Within tables there are often Keys. A Key may be a Primary Key (PK), which acts as a unique identifier for individual records within a table, or they may be a Foreign Key (FK) which refers to a unique identifier in another table. Primary Keys and Foreign Keys are critical to join tables in a SQL query.

In relational databases such as Neotoma you will often have “entity” tables, that represent explicit things (e.g., people, sites), and then tables that express the relationship between entities using foreign keys. So, for example the table ndb.sitegeopoliticalunits expresses the relationship between ndb.sites and ndb.geopoliticalunits. Both ndb.sites and ndb.geopoliticalunits represent entities; a country is a thing, with properties, as is a site. The ndb.sitegeopoliticalunits defines the relationship betwen these things. It is a table with two columns, one is an FK that refers back to the PK, siteids, in ndb.sites, and one is an FK for the primary key of the ndb.geopoliticalunits.

Switching to the SQL Query tab in the above example will show you that we are using the PK for the geopoliticalunits, geopoliticalid. It links the complete information in the ndb.geopoliticalunits table to the ndb.sites table, through a JOIN table, that has two columns, the siteid that is the PK for ndb.sites and the geopoliticalid column, that is the PK for ndb.geopoliticalunits. In this way, siteid is the PK in ndb.sites and an FK in ndb.sitegeopoliticalunits.

3.3 Data Types

Neotoma uses several standard SQL data types. You can find out more about these individual data types in the Postgres data type documentation.

3.3.1 Query

3.3.1.1 SQL Query

3.3.1.2 Data

DT::datatable(datatypes)