Hierarchical Models and Dynamic Correlation • GPEDM

library(GPEDM)
library(ggplot2)

Introduction

Many ecological time series are short relative to the timescale of the system, which presents a major obstacle to the successful application of EDM: if the time series does not adequately cover the range of possible dynamics and make a sufficient number of ‘cycles’ around the attractor, it can be difficult for EDM to make reliable predictions.

One solution is to leverage multiple (potential short) time series from replicates with similar dynamics. These could be multiple replicates in an experiment, populations of the same species from multiple locations, different life history stages of the same species, or potentially even multiple species that are suspected to have similar dynamics. The most common case is likely to be replicate time series collected for multiple populations of the same species over space, so we’ll be referring to the replicates as ‘populations’ but these replicates can represent other things as well.

If you have time series data from multiple populations, GPEDM includes several options for integrating the data into a common model, which can potentially lead to better predictions than models fit to each short time series independently. This approach requires that the delay embedding maps are similar across populations (usually a reasonable assumption for populations of the same species) but not strongly synchronized (i.e. each series provides some independent information).

The hyperparameter rho is the dynamic correlation in the hierarchical GP model, indicating similarity of dynamics across populations. In contrast to traditional correlation metrics (e.g. Pearson correlation), which quantify the similarity of population fluctuations over time (i.e. synchrony), the dynamic correlation quantifies the similarity of population responses across predictor space. The value of rho can be fixed to any value from 0.0001 to 0.9999 using rhofixed, otherwise it is estimated from the data. All populations are assumed to have the same rho value relative to one another. Alternatively, one can estimate pairwise values of rho between populations and supply a fixed matrix of rho values (rhomatrix) to use, rather than a single value.

Scaling considerations

Before fitting any models, you should think about how you want to standardize the data. When you have multiple populations, scaling the data within populations (scaling=local) vs. across populations (scaling=global) can lead to very different model fits and performance, so it is worth putting some thought into this. Generally speaking, if the variable has the same units across populations, you will probably want to use global scaling. If the variable has different units, you will probably want to use local scaling. However, you may have valid reasons to deviate from this, and trying both might be worthwhile if in doubt. For instance, if the species of interest has different carrying capacities in different regions, local scaling of dynamics might make more sense, even if the units are the same. If you have covariates, you might choose to use different scalings for different variables. (Note scaling=global and scaling=local apply to all predictors, whether they’re lags, covariates, or lags of covariates, and they all are scaled independently of each other.) To use a mix of scalings, you need to scale the data yourself beforehand and set scaling="none". You might get warnings that values are not scaled properly, which is because generating lags elimiates some data, so the mean and sd of each predictor may deviate somewhat from 0 and 1. They should not be too far off though, so this warning can usually be ignored. Note that if unspecified, scaling in fitGP defaults to "global". See the documentation for fitGP for more information about scaling.

Embedding considerations

For a single population, the maximum value of $E$ considered should not be much larger than the square root of the time series length. When combining data from multiple populations, you have more data at your disposal to reconstruct dynamics, and can use potentially higher values of $E$ than you could for a single population. However, $E\tau$ cannot be greater than the shortest within-population time series length, so you may wish to constrain $E$ in some way to ensure sufficient training data across populations. Note than when using hierarchical models, all populations will use the same $E$ and $\tau$ .

Model evaluation considerations

When using multiple populations, there are a few additional things you need to consider when evaluating model performance. The built-in leave-one-out routine, predictmethod="loo", leaves out each individual datapoint, so datapoints taken at the same time but in a different location might be included in the training set. If this is of concern, there is another routine, predictmethod="lto", which leaves out all datapoints taken at the same time across all locations. If there is only one population, these are obviously equivalent. The routine predictmethod="sequential" leaves out all future timepoints across all locations. If you decide to use a training/test split, take care to split each population, not the stacked series. Predictions cannot be made for populations which do not appear in the training data.

In addition to comparing different hierarchical structures (described below), you might also wish to compare the results from models fit to each population separately. Note that as with all hierarchical models, the hierarchical structure is a constraint that forces some similarity of dynamics across populations. Populations in the same hierarchical model will share the same embedding parameters ( $E$ and $\tau$ ) and the same inverse length scale parameters (this includes models with rho=0). This means the hierarchical model is inherently less flexible than if you were to fit separate models to each population individually. If each population has a lot of data, the hierarchical fit might be poorer than fitting a model to each population individually, due to this decreased flexibility. On the flip side, if each population does not have much data, or the data are very noisy, the constraints and pooled information in a hierarchical model can lead to better performance than individual fits.

Beware of local minima. If the relative performance of models seems off, or better results are obtained using a fixed rho than a fitted one, then the model might be getting stuck in a local minima. One could try using different values of initpars.

It might also be that some populations have very different dynamics than the others, and you might get better performance by spliting the populations up into different hierarchical models. This situation might also contribute to local minima. The pairwise rhomatrix might be informative here.

If the populations have very different local means (or variances), note that the total (across-population) $R^2=1-MSE/Var(y)$ can be potentially misleading. This is because the across-population variance will be much larger than the within population variance, increasing $R^2$ even if MSE is constant. Put another way, simply getting the local means right can explain a lot of the across-population variance even if there is little prediction accuracy within a population. In this case, we would recommend looking at the within-population $R^2$ values, or at one of the alternative $R^2$ values provided in the fitstats: R2centered (total $R^2$ with local means removed) or R2scaled (total $R^2$ with local centering and scaling), which may be more accurate measures of total performance than total $R^2$ if the populations have very different local means and/or variances.

Example dataset

In our example, we will use log-transformed catch-per-unit-effort (CPUE) time series for brown shrimp from 9 different regions (statistical zones) in the Gulf of Mexico. These are annual time series from the SEAMAP Summer Groundfish Trawl Survey, spanning 33 years. Plotting the time series, fluctuations in shrimp abundance in each region appear roughly similar, but have different means, so we decided to use local scaling in this case. Trying both scalings also showed that local scaling led to better performance.

In this example, we only use lags of CPUE, but we note that in the publication from which these data were taken (Tsai et al. 2022), better predictions were obtained by using environmental and fishery catch data as covariates. The results here deviate from the results in the paper because of the absence of these covariates.

When using the hierarchical models, the data must be in long format with a column for population identifier. Note that populations do not need to have the same number of observations, or overlapping time indices, as in this example.

E = 5
scaling = "local"

#get log abundance
data(shrimp)
shrimp$splog=log(shrimp$cpue)

#generate lags
splags = makelags(shrimp, y = "splog", pop = "zone", E = E, tau = 1, append = T)

# ggplot(shrimp) +
#   facet_wrap(zone~., scales = "free") +
#   geom_point(aes(y=cpue, x=year)) + geom_line(aes(y=cpue, x=year)) +
#   theme_bw() + labs(y="Brown shrimp CPUE")

baseplot=ggplot(shrimp) +
  facet_wrap(zone~., scales = "free") +
  geom_point(aes(y=splog, x=year)) + geom_line(aes(y=splog, x=year)) +
  theme_bw() + labs(y="log Brown Shrimp CPUE")
baseplot

Individual models

For comparison, we will fit models to each population individually.

zones=unique(splags$zone)
spmodind<-spmodindout<-spmodindin<-list()
for(i in seq_along(zones)) {
  dsub=subset(splags, zone==zones[i])
  spmodind[[i]] = fitGP(dsub, y = "splog", x = paste0("splog_",1:E), 
                        pop = "zone", time = "year", scaling = scaling, predictmethod = "lto")
  spmodindin[[i]] = spmodind[[i]]$insampresults
  spmodindout[[i]] = spmodind[[i]]$outsampresults
}
spmodindin=do.call(rbind, spmodindin)
spmodindout=do.call(rbind, spmodindout)
(R2indivin=getR2(spmodindin$obs,spmodindin$predmean))
#> [1] 0.8165698
(R2indivout=getR2(spmodindout$obs,spmodindout$predmean))
#> [1] 0.6121564
colnames(spmodindout)[2]="zone"

baseplot + 
  ggtitle("Individual") +
  geom_point(data=spmodindout, aes(y=predmean, x=timestep), color="red") +
  geom_line(data=spmodindout, aes(y=predmean, x=timestep), color="red") +
  geom_ribbon(data=spmodindout,
              aes(x=timestep, ymin=predmean-predsd,ymax=predmean+predsd),fill="red", alpha = 0.3)
#> Warning: Removed 45 rows containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 5 rows containing missing values or values outside the scale range
#> (`geom_line()`).

Identical dynamics

Including pop and setting rhofixed = 1 fits a multi-population model in which the dynamics of each population are forced to be identical. In other words, we assume that all delay vectors come from the same attractor.

spmodrho1 = fitGP(splags, y = "splog", x = paste0("splog_",1:E), 
                  pop = "zone", rhofixed = 1,
                  time = "year", scaling = scaling, predictmethod = "lto")
#> rhofixed must between 0.0001 and 0.9999, setting to 0.9999
summary(spmodrho1)
#> Number of predictors: 5 
#> Length scale parameters:
#>      predictor posteriormode
#> phi1   splog_1       0.06700
#> phi2   splog_2       0.00120
#> phi3   splog_3       0.02591
#> phi4   splog_4       0.10732
#> phi5   splog_5       0.02019
#> Process variance (ve): 0.6568221
#> Pointwise prior variance (sigma2): 0.5213629
#> Number of populations: 9
#> Dynamic correlation (rho): 0.9999
#> In-sample R-squared: 0.6027854 
#> In-sample R-squared by population:
#>                 R2
#> zone11  0.18816256
#> zone14  0.51847339
#> zone15  0.25033220
#> zone16  0.44688137
#> zone17  0.60331705
#> zone18  0.52130162
#> zone19  0.21999381
#> zone20 -0.06904110
#> zone21 -0.09802088
#> Out-of-sample R-squared: 0.5555569
#> Out-of-sample R-squared by population:
#>                R2
#> zone11  0.1335001
#> zone14  0.4542791
#> zone15  0.1573202
#> zone16  0.3737063
#> zone17  0.5488948
#> zone18  0.4557023
#> zone19  0.1193898
#> zone20 -0.1936677
#> zone21 -0.2132316

spout=spmodrho1$outsampresults
colnames(spout)[2]="zone"

baseplot + 
  ggtitle("Identical") +
  geom_point(data=spout, aes(y=predmean, x=timestep), color="red") +
  geom_line(data=spout, aes(y=predmean, x=timestep), color="red") +
  geom_ribbon(data=spout,
              aes(x=timestep, ymin=predmean-predsd,ymax=predmean+predsd),fill="red", alpha = 0.3)
#> Warning: Removed 45 rows containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 5 rows containing missing values or values outside the scale range
#> (`geom_line()`).

As with other EDM approaches (Simplex, S-map), the same results can be obtained by simply concatenating the delay vectors for multiple populations and fitting a standard model, as if the data all came from one population (i.e. omit the pop argument even though there are multiple populations). Note that automatic local scaling cannot be done this way (because only one population is assumed), and thus any local scaling must be done beforehand. You will also not get fit stats for each population. To ensure there isn’t crossover between the end of one time series and the beginning of another when constructing the delay vectors, this also requires making lags beforehand and including the pop argument in makelags.

spmod = fitGP(splags, y = "splog", x = paste0("splog_",1:E), 
              time = "year", scaling = scaling, predictmethod = "lto")
summary(spmod)

#rejoin the zone information (not considered during model fitting)
spout = spmod$outsampresults
spout$zone = splags$zone

Independent dynamics

You can force the populations to have independent dynamics by setting rhofixed = 0. Note that the populations in this case will still share the same inverse length scale parameters, so due to this decreased flexibility, the fit might be poorer than fitting a model to each population independently.

spmodrho0 = fitGP(splags, y = "splog", x = paste0("splog_",1:E), 
                  pop = "zone", rhofixed = 0,
                  time = "year", scaling = scaling, predictmethod = "lto")
#> rhofixed must between 0.0001 and 0.9999, setting to 0.0001
summary(spmodrho0)
#> Number of predictors: 5 
#> Length scale parameters:
#>      predictor posteriormode
#> phi1   splog_1       0.10242
#> phi2   splog_2       0.30842
#> phi3   splog_3       0.11504
#> phi4   splog_4       0.16779
#> phi5   splog_5       0.17300
#> Process variance (ve): 0.5720322
#> Pointwise prior variance (sigma2): 0.3471291
#> Number of populations: 9
#> Dynamic correlation (rho): 1e-04
#> In-sample R-squared: 0.7190166 
#> In-sample R-squared by population:
#>               R2
#> zone11 0.3566795
#> zone14 0.6152239
#> zone15 0.4512542
#> zone16 0.6093629
#> zone17 0.6780450
#> zone18 0.6445697
#> zone19 0.4516270
#> zone20 0.3520710
#> zone21 0.3915480
#> Out-of-sample R-squared: 0.5287102
#> Out-of-sample R-squared by population:
#>                 R2
#> zone11 -0.03022679
#> zone14  0.37914299
#> zone15  0.05440260
#> zone16  0.33830490
#> zone17  0.51555075
#> zone18  0.40588994
#> zone19  0.02939522
#> zone20 -0.10142623
#> zone21 -0.13070247

Hierarchical dynamics (single rho)

To allow for information sharing across populations without assuming the dynamics are identical, we can fit a hierarchical model with a free rho parameter. Simply omit the rhofixed argument, and rho will be estimated.

spmodrho = fitGP(splags, y = "splog", x = paste0("splog_",1:E), 
                 pop = "zone",
                 time = "year", scaling = scaling, predictmethod = "lto")
summary(spmodrho)
#> Number of predictors: 5 
#> Length scale parameters:
#>      predictor posteriormode
#> phi1   splog_1       0.13002
#> phi2   splog_2       0.00000
#> phi3   splog_3       0.06131
#> phi4   splog_4       0.17783
#> phi5   splog_5       0.03767
#> Process variance (ve): 0.6467111
#> Pointwise prior variance (sigma2): 0.2658343
#> Number of populations: 9
#> Dynamic correlation (rho): 0.9169996
#> In-sample R-squared: 0.6193695 
#> In-sample R-squared by population:
#>                 R2
#> zone11  0.20436617
#> zone14  0.53323690
#> zone15  0.25772910
#> zone16  0.46958275
#> zone17  0.63364240
#> zone18  0.53115126
#> zone19  0.26551303
#> zone20 -0.01321866
#> zone21 -0.03786475
#> Out-of-sample R-squared: 0.5532995
#> Out-of-sample R-squared by population:
#>                R2
#> zone11  0.1029215
#> zone14  0.4470370
#> zone15  0.1252973
#> zone16  0.3732042
#> zone17  0.5673172
#> zone18  0.4386862
#> zone19  0.1240376
#> zone20 -0.1858376
#> zone21 -0.2057786

spout=spmodrho$outsampresults
colnames(spout)[2]="zone"

baseplot +  
  ggtitle("Hierarchical") +
  geom_point(data=spout, aes(y=predmean, x=timestep), color="red") +
  geom_line(data=spout, aes(y=predmean, x=timestep), color="red") +
  geom_ribbon(data=spout,
              aes(x=timestep, ymin=predmean-predsd,ymax=predmean+predsd),fill="red", alpha = 0.3)
#> Warning: Removed 45 rows containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 5 rows containing missing values or values outside the scale range
#> (`geom_line()`).

Hierarchical dynamics (rho matrix)

Instead of using a single rho value, a matrix of fixed pairwise rho values can be supplied using rhomatrix. In this case, the single rho parameter will not be used and will revert to the mode of its prior (~0.5). A pairwise rho matrix can be generated using getrhomatrix, or you can create a custom one (e.g. based on geographical distance). All getrhomatrix does is fit a hierarchical model to each pair of populations and put the resulting rho values in a matrix that can be passed to fitGP. The rhomatrix itself can be informative about dynamic similarity among populations, and reveal potential clustering and spatial structure.

If you get an error that the matrix is not positive definite (which can happen), you can use Matrix::nearPD() with corr=TRUE to obtain a positive definite matrix close to rhomatrix. The function GPEDM::posdef() is a quick wrapper for this.

Rhomat = getrhomatrix(splags, y = "splog", x = paste0("splog_",1:E), 
                      pop = "zone",
                      time = "year", scaling = scaling)
Rhomat
#>           zone11    zone14    zone15    zone16    zone17    zone18    zone19
#> zone11 1.0000000 0.6976526 0.6244876 0.6422882 0.7827404 0.8218578 0.7974962
#> zone14 0.6976526 1.0000000 0.8102220 0.8576069 0.8259549 0.8253132 0.8244647
#> zone15 0.6244876 0.8102220 1.0000000 0.7847154 0.8313152 0.8528488 0.7430921
#> zone16 0.6422882 0.8576069 0.7847154 1.0000000 0.8312329 0.5110798 0.7345495
#> zone17 0.7827404 0.8259549 0.8313152 0.8312329 1.0000000 0.7535192 0.6125335
#> zone18 0.8218578 0.8253132 0.8528488 0.5110798 0.7535192 1.0000000 0.8547646
#> zone19 0.7974962 0.8244647 0.7430921 0.7345495 0.6125335 0.8547646 1.0000000
#> zone20 0.6253667 0.6009822 0.3716549 0.6315166 0.4731135 0.2439610 0.5750741
#> zone21 0.4844173 0.4282931 0.3838596 0.2655707 0.3700907 0.5366677 0.3783972
#>           zone20    zone21
#> zone11 0.6253667 0.4844173
#> zone14 0.6009822 0.4282931
#> zone15 0.3716549 0.3838596
#> zone16 0.6315166 0.2655707
#> zone17 0.4731135 0.3700907
#> zone18 0.2439610 0.5366677
#> zone19 0.5750741 0.3783972
#> zone20 1.0000000 0.5239022
#> zone21 0.5239022 1.0000000

#plot the rho matrix
Rmatlong=as.data.frame(as.table(Rhomat)) #for plotting purposes only
colnames(Rmatlong)=c("Region1", "Region2", "rho")
ggplot(Rmatlong, aes(x=Region1, y=Region2, fill=rho)) +
  geom_tile() +
  scale_y_discrete(expand = c(0, 0)) +
  scale_x_discrete(expand = c(0, 0)) +
  theme(panel.background = element_rect(fill="gray"), 
        panel.border = element_rect(color="black", fill="transparent"))


Rhomatpd=posdef(Rhomat)

spmodrhomat = fitGP(splags, y = "splog", x = paste0("splog_",1:E), 
                    pop = "zone", rhomatrix = Rhomatpd,
                    time = "year", scaling = scaling, predictmethod = "lto")
summary(spmodrhomat)
#> Number of predictors: 5 
#> Length scale parameters:
#>      predictor posteriormode
#> phi1   splog_1       0.20552
#> phi2   splog_2       0.01286
#> phi3   splog_3       0.06446
#> phi4   splog_4       0.22136
#> phi5   splog_5       0.07911
#> Process variance (ve): 0.5900952
#> Pointwise prior variance (sigma2): 0.2670027
#> Number of populations: 9
#> Dynamic correlation (rho): rhomatrix
#> In-sample R-squared: 0.6712945 
#> In-sample R-squared by population:
#>               R2
#> zone11 0.2387228
#> zone14 0.5907084
#> zone15 0.3096061
#> zone16 0.5500649
#> zone17 0.6919789
#> zone18 0.6146147
#> zone19 0.3616301
#> zone20 0.1190174
#> zone21 0.1880890
#> Out-of-sample R-squared: 0.5642167
#> Out-of-sample R-squared by population:
#>                 R2
#> zone11  0.05478630
#> zone14  0.47564667
#> zone15  0.09787071
#> zone16  0.39897111
#> zone17  0.60659794
#> zone18  0.47556592
#> zone19  0.14990686
#> zone20 -0.20415259
#> zone21 -0.15334904

spout=spmodrhomat$outsampresults
colnames(spout)[2]="zone"

baseplot + 
  ggtitle("Hierarchical Matrix") +
  geom_point(data=spout, aes(y=predmean, x=timestep), color="red") +
  geom_line(data=spout, aes(y=predmean, x=timestep), color="red") +
  geom_ribbon(data=spout,
              aes(x=timestep, ymin=predmean-predsd,ymax=predmean+predsd),fill="red", alpha = 0.3)
#> Warning: Removed 45 rows containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 5 rows containing missing values or values outside the scale range
#> (`geom_line()`).

Compare models

comparison=data.frame(Model=c("Individual", "Independent", "Identical", "Single rho", "Rho matrix"),
                      InSampR2=c(R2indivin,
                                 spmodrho0$insampfitstats["R2"],
                                 spmodrho1$insampfitstats["R2"],
                                 spmodrho$insampfitstats["R2"],
                                 spmodrhomat$insampfitstats["R2"]),
                      OutSampR2=c(R2indivout,
                                  spmodrho0$outsampfitstats["R2"],
                                  spmodrho1$outsampfitstats["R2"],
                                  spmodrho$outsampfitstats["R2"],
                                  spmodrhomat$outsampfitstats["R2"]))
comparison
#>         Model  InSampR2 OutSampR2
#> 1  Individual 0.8165698 0.6121564
#> 2 Independent 0.7190166 0.5287102
#> 3   Identical 0.6027854 0.5555569
#> 4  Single rho 0.6193695 0.5532995
#> 5  Rho matrix 0.6712945 0.5642167

popcomparison=data.frame(Individual=sapply(spmodind, function(x) x$outsampfitstats["R2"]),
                         Independent=spmodrho0$outsampfitstatspop$R2pop,
                         Identical=spmodrho1$outsampfitstatspop$R2pop,
                         Single_rho=spmodrho$outsampfitstatspop$R2pop,
                         Rho_matrix=spmodrhomat$outsampfitstatspop$R2pop)
popcomparison
#>         Individual Independent  Identical Single_rho  Rho_matrix
#> zone11 0.087608203 -0.03022679  0.1335001  0.1029215  0.05478630
#> zone14 0.412489699  0.37914299  0.4542791  0.4470370  0.47564667
#> zone15 0.193900661  0.05440260  0.1573202  0.1252973  0.09787071
#> zone16 0.479309505  0.33830490  0.3737063  0.3732042  0.39897111
#> zone17 0.605466183  0.51555075  0.5488948  0.5673172  0.60659794
#> zone18 0.603964931  0.40588994  0.4557023  0.4386862  0.47556592
#> zone19 0.341658106  0.02939522  0.1193898  0.1240376  0.14990686
#> zone20 0.086590821 -0.10142623 -0.1936677 -0.1858376 -0.20415259
#> zone21 0.006953901 -0.13070247 -0.2132316 -0.2057786 -0.15334904

References

Munch, S. B., Poynor, V., and Arriaza, J. L. 2017. Circumventing structural uncertainty: a Bayesian perspective on nonlinear forecasting for ecology. Ecological Complexity, 32:134.

Rogers, T. L., and Munch, S. B. 2020. Hidden similarities in the dynamics of a weakly synchronous marine metapopulation. Proceedings in the National Academy of Science, 117(1), 479–485.

Tsai, C. H., Munch, S. B., Masi, M. D., and Pollack, A. G. 2022. Predicting nonlinear dynamics of short-lived penaeid shrimp species in the Gulf of Mexico. Canadian Journal of Fisheries and Aquatic Sciences.