6 SQL Quickly
SQL (Sturctured Query Language) is a standard language for querying and modifying relational databases. There is an official standard for SQL, which means that statements such as SELECT
and WHERE
are common across most SQL implementations. While many properties are common, individual vendors do maintain proprietary formatting. Neotoma is implemented using PostgreSQL, which has extensive documentation online. It is beyond the scope of this document to teach SQL, however, we have included a large number of SQL queries along with the documentation to provide templates for queries, and to help the user understand how to directly work with the database. These queries can by typed or copied and pasted into database tools such as pgAdmin.
For the purposes of database development, the Neotoma team uses the following SQL tools:
Throughout the Neotoma Database Manual we will refer to tables using the format schema.tablename
. The ndb
namespace is the schema for most of the data tables within the Neotoma database, so you will see things like ndb.taxa
and ndb.sites
frequently. There are other schema, including apps
, doi
and public
, but the data tables themselves are kept in the ndb
schema. You can see all available data tables in the ndb
schema using the Neotoma database schema website.
6.1 SQL Example
The following SQL example lists the number of sites by the geopolitical unit, where the geopolitical unit is a country. It uses three tables, ndb.sites
, which contains site information, ndb.geopoliticalunits
, which lists all geopolitical units (countries, provinces, towns, etc.), and a JOIN
table, which helps to link the ndb.sites
table to the ndb.geopoliticalunits
table.
By convention, SQL first defines the output, using the SELECT
statement. The FROM
block is used to explain how the information is brought together, and the WHERE
(and subsequent blocks) are meant to help filter and aggregate data as generated within the FROM
block.
Throughput this document we will use AS
statemements to help us write our SQL queries neatly, and we will use INNER JOIN
calls with ON
statements, rather than NATURAL INNER JOIN
calls. Using AS
helps us create an alias so that we don’t have to write so much in our query. As much as possible we will use the same aliases for tables throughout this guide. For example, FROM ndb.sites AS st
is commonly used, where st
becomes the alias for the ndb.sites
table, and stands in for it elsewhere in the query. Often you will see gpu
used for ndb.geopoliticalunits
, or tx
for ndb.taxa
. In Postgres SQL it is not neccessary to use the term AS
(we use it here to make the queries clearer for folks just learning how to use SQL), so it’s possible to see things like ndb.sites st
in place of ndb.sites AS st
.
The NATURAL INNER JOIN
assumes that any columns with common names are used in a JOIN
. Because Neotoma enforces naming consistency (in general) between primary and foreign keys, we ought to be able to perform NATURAL
joins. However, a trigger on most Neotoma tables adds the columns recdatecreated
and recdatemodified
to almost all tables. For this reason we make it practice within SQL queries to be explicit about the joining columns.
The following example show the use of queries with and without AS
:
6.1.1 SQL Query
SELECT
COUNT(*) AS sites,
gpu.geopoliticalname,
gpu.geopoliticalunitFROM
AS gpu
ndb.geopoliticalunits INNER JOIN ndb.sitegeopolitical AS sgp ON sgp.geopoliticalid = gpu.geopoliticalid
INNER JOIN ndb.sites st ON st.siteid = sgp.siteid
WHERE
= 'country'
gpu.geopoliticalunit GROUP BY
gpu.geopoliticalidHAVING COUNT(*) > 0
ORDER BY sites DESC;
::datatable(geopoliticalsort) DT
6.2 Table Keys
Within tables there are often Keys. A Key may be a Primary Key (PK), which acts as a unique identifier for individual records within a table, or they may be a Foreign Key (FK) which refers to a unique identifier in another table. Primary Keys and Foreign Keys are critical to join tables in a SQL query.
In relational databases such as Neotoma you will often have “entity” tables, that represent explicit things (e.g., people, sites), and then tables that express the relationship between entities using foreign keys. So, for example the table ndb.sitegeopoliticalunits
expresses the relationship between ndb.sites
and ndb.geopoliticalunits
. Both ndb.sites
and ndb.geopoliticalunits
represent entities; a country is a thing, with properties, as is a site. The ndb.sitegeopoliticalunits
defines the relationship betwen these things. It is a table with two columns, one is an FK that refers back to the PK, siteid
s, in ndb.sites
, and one is an FK for the primary key of the ndb.geopoliticalunits
.
Switching to the SQL Query tab in the above example will show you that we are using the PK for the geopoliticalunits
, geopoliticalid
. It links the complete information in the ndb.geopoliticalunits
table to the ndb.sites
table, through a JOIN
table, that has two columns, the siteid
that is the PK for ndb.sites
and the geopoliticalid
column, that is the PK for ndb.geopoliticalunits
. In this way, siteid
is the PK in ndb.sites
and an FK in ndb.sitegeopoliticalunits
.
6.3 Data Types
Neotoma uses several standard SQL data types. You can find out more about these individual data types in the Postgres data type documentation.