3 Working with the Raw Data
Neotoma is a Postgres database. The data is stored on a database server in the cloud and most people interact with the database indirectly, either through the Neotoma Explorer, the neotoma2
R package, Range Mapper or other tools. Much of this manual discusses the raw, underlying data that powers these tools. They all pull their data from the Neotoma API, which is an application that sends data from the database over the internet using specially constructed URLs.
An open API, like Neotoma’s is useful because all you need to access the data is an internet connection and the ability to understand JSON (either by scanning it visually, or using a programming language like Python, R or JavaScript). For example:
https://api.neotomadb.org/v2.0/data/sites?sitename=Marion Lake
returns a JSON object that provides metadata about the sites in Neotoma that use the name “Marion Lake”. More details about the API can be obtained from the online help for the API at https://api.neotomadb.org.
3.1 Using R
The neotoma2
R package provides a set of functions to download and work with data from Neotoma within the R programming environment. The package has been designed for users who wish to work with multiple sites or datasets, and supports users who wish to add their own data into Neotoma.
The functions in the R package act as wrappers for the API calls, and also provide some secondary services to help the data work efficiently in the R environment. For example, a user who wishes to undertake the same search query as above, in R, simply uses the get_sites()
function:
library(neotoma2)
<- get_sites(sitename = "Marion Lake") marion
More details about using the R package are available on the GitHub page for the package, or in one or more of our posted workshops:
- European Pollen Database Workshop (May 22, 2022 – https://open.neotomadb.org/EPD_binder/simple_workflow.html)
- AMQUA Pollen Workshop (May 22, 2022 – https://open.neotomadb.org/Workshops/AMQUA-June2022/simple_workflow.html)
- IAL/IPA Diatom Workshop (November 2022 – SPANISH [https://open.neotomadb.org/Workshops/IAL_IPA-November2022/simple_workflow_ES.html])
Other workshop materials are available within the Neotoma Workshops GitHub repository. Some workshops are highly specialized, and some contain links to cloud-based versions of RStudio so that users can work on the problems and workflows under standardized conditions.
3.2 Using the Database Locally
Users who wish to gain more experience working directly with SQL, or who need to undertake specialized analysis that is not supported by the R package or available APIs may choose to use the database directly. This involves installing PostgreSQL and associated add-ons (PostGIS in particular). Users should be aware that the database is not a program that they are commonly familiar with. The database runs in the background and users will generally “connect” to the database from R, Python, or another programming language, or they may use a database tool such as pgAdmin or dBeaver. Postgres also comes with the commandline tool psql
, where a user can connect directly to the database from the terminal and type their queries directly.
Using the psql
commandline utility is one way of directly interacting with the database if you have a connection to a database server with Neotoma data loaded.