Pollen-based Climate Reconstruction

1 Introduction

Pollen-based climate reconstruction is a widely used technique for understanding past climate. Pollen-based modelling is often highly technical, and requirs a solid understanding of paleoecological processes, ecological knowledge of forest commnuity and species affinities along climate gradients, technical understanding of chronology construction and model fitting. At the same time, the people who undertake this activity are often graduate students, working in programs that do not include specific units on paleoecology (in some cases) or pollen-based climate reconstruction (in most cases).

In this vignette we will use open-source tools from the Comprehensive R Archive Network to reconstruct a single climate record from a site in Canada. To do this we will first test the calibration dataset against climate data, both from the North American Modern Pollen Database, we will then pick one target climate variable for reconstruction and perform reconstructions using three models, Weighted Averaging with monotone smooth deshrinking, Weighted Averaging Partial Least Squares and the Modern Analogue Technique.

To first evaluate the models we need to see how well the work with the underlying climate data:

2 Model Diagnostics

The Modern Pollen Database stored in analogue contains 4833 samples, representing 134 different pollen taxa. The data was obtained from the Whitmore et al (2008) North American Modern Pollen Database. The raw data can be obtained from one of two sources the Laboratory for Paleoclimatology and Climatology at the University of Ottawa and the Williams Paleoecology Lab at the University of Wisconsin. Modern pollen data is available from the Neotoma Paleoecology Database, although the complete North American Modern Pollen Database has not yet been uploaded to Neotoma.

2.1 Checking the Transfer Functions

Considerable work has been put into understanding transfer functions and challenges associated with their construction, implementation and interpretation. I suggest taking some time to understand the basic theory, starting with the excellent review by Birks et al. in the Journal of Open Ecology [link]. Telford and Birks [ref needed] and others have pointed to the need for more rigorous validation of models, and this vignette is a first step at addressing some of the issues. As this vignette is developed further I will add more robust cross-validation.

2.1.1 Checking the calibration data set

To test the models I used the palaeoSig package’s randomTS() function, which tests the models against randomly sorted data. Significance for any climate variable indicates that the model reconstruction is better than random numbers. In each case I use 70% of the training set, rather than the full training set. The model takes the proportion of variance accounted for by the actual data, and then compares it to the proportion of variance accounted for by the randomized data. This is then done for each of the different methods used for calibration.

In each case these methods are simply testing whether the modern calibration is able to detect signals in each of the climate parameters. The example here uses the entire North American Modern Pollen Database, rather than a targeted data subset. It also uses the climate variables provided with the dataset. For a given research application it may be better to specifically choose a particular climate variable (or climate subset) or obtain new climatic data from a more recent downscaled climate data product, for example WorldClim or PRISM.

2.1.1.1 WA - Monotone Deshrinking

% Explained p-value
tjan 0.17 0.01
tfeb 0.17 0.01
tmar 0.16 0.01
tapr 0.15 0.01
tmay 0.14 0.01
tjun 0.14 0.01

Weighted Averaging results using monotone deshrinking on subsets of the pollen data indicate that there is significance in reconstructions of most temperature parameters, while precipitation variables are not significant. While temperature variables are significant, this relatively naive approach indicates that there is a relatively low proportion of variance explained by the temperature reconstructions, and that the vegetation (or at least the pollen representation of the regional vegetation) is most strongly affected by summer temperatures.

2.1.1.2 WAPLS (Four components)

% Explained p-value
tjan 0.13 0.01
tfeb 0.14 0.01
tmar 0.13 0.01
tapr 0.12 0.01
tmay 0.10 0.01
tjun 0.10 0.01

For WAPLS we see the same pattern as with WA. Temperature variables all show significance, they explain low variance and winter temperatures show higher variance explained than summer variables.

2.1.1.3 MAT - ten closest

% Explained p-value
tjan 0.11 0.01
tfeb 0.12 0.01
tmar 0.11 0.01
tapr 0.10 0.01
tmay 0.09 0.01
tjun 0.08 0.01

Reconstructions for MAT show even lower variance explained (although a similar pattern) than the other methods. This is surprising, in part because MAT often shows better fit as a result of spatial autocorrelation. The very weak variance explained for some variables (pdec for example) paired with high significance (p < 0.01), also indicates the risks of choosing variables soley based on p value. It should also be noted that the pattern of variance explained here, and elsewhere represents the impact of temporal autocorrelation on temperature (and, less so) on precipitation variables.

3 Reconstruction Statistics

3.1 Reconstruction Significance

Now we test to see which of the fossil assemblage reconstructions show significant changes over the course of the reconstruction. This uses the same randomTF() function, but the degree of variance and significance is likely to change given that the test dataset, in this case Orhid, has changed and has a much more constrained ecological space than the entire European Modern Pollen Database.

3.1.1 Obtaining a record from Neotoma

Then we apply a reconstruction to a real dataset. In this case we will select a record with coverage across multiple timescales. For the sake of this example, we’ll restrict the analysis to a record from Canada.

can_sites <- neotoma::get_dataset(ageold = 12000, 
                                  ageyoung = -50, 
                                  gpid = 'Canada',
                                  datasettype = 'pollen')


datasets <- can_sites %>% map_int(function(x) x$dataset.meta$dataset.id)

This returns 435 sites. The neotoma package provides plotting capabilities, but they are rudimentary. To provide interactive plotting we plot the sites dynamically using leaflet. This isn’t something you need to do, but it does help showcase the flexibility of R:

## API call was successful. Returned record for Crawford Lake

Data record 49 within the Canadian data represents Crawford Lake, Ontario, a record published by Zicheng Yu in 1997. The record spans 14000 years, and is impressive, both for its resolution, but also for the story it tells of environmental change at the site.

analogue::Stratiplot(crawford)

Knowing that our methods, using the complete North American dataset, generate fairly weak reconstructions for precipitation, we will focus on a reconstruction of winter teperature for this record (tjan for simplicity’s sake). We also want to clean up our data sets a bit:

3.1.1.1 WA - Monotone Deshrinking

% Explained p-value
env 0.23 0.68

Weighted Averaging results change, as expected. We now see that even though the calibration dataset appears to produce a significant model for january temperature, the result is a reconstruction that shows no significant change over the last 14,000 years. To save you the trouble, changing Climate[,1] to Climate (i.e. reconstructing all variables in the hopes of \(p\)-hacking) doesn’t help. The record itself has no significant change over the time for any variable.

3.1.1.2 WAPLS (Four components)

% Explained p-value
env 0.21 0.63

With WAPLS we se no significance for tjan, the reconstructed temperature here, but, as with WA, no significance for any of the tested variable.

3.1.1.3 MAT - ten closest

% Explained p-value
tjan 0.19 0.16
tfeb 0.18 0.16

Again, no significance for any of the models. This indicates that we’re just not able to see a signal within the data, but, as I noted before, the calibration dataset used here is likely overly broad.

3.2 Reconstruction

Once we have validated the methods, we re-run the analysis using each of the three methods, across all the climate variables (just in case).

3.2.1 Model Summary

4 Saving to file

Ultimately, you would want to save values to file. Your preference for the savefile structure will vary. Here we are exporting two files, one with error and one with the raw reconstructions. This implies a folder structure that I like, which is having a project data folder with both input and output sub-directories. This helps keep your data files separated. You might want to go further and add version numbers to the output files. I’ll leave that up to the user’s discretion.

write.csv(reshape2::dcast(get_clim, age ~ model + climate, value.var = "reconst"), 
          "data/output/clim_reconst.csv")
write.csv(reshape2::dcast(get_clim, age ~ model + climate, value.var = "err"), 
          "data/output/clim_err.csv")