Fits a GP model with separable length scales and automatic relevance determination (ARD). Also fits a hierarchical version of the GP model if distinct populations are indicated using pop. There are several ways to specify the training data:

  • A. supply data frame data, plus column names or indices for y, x, pop, and time.

  • B. supply vectors for y, pop, and time, and a matrix or vector for x.

For each of the above 2 options, there are 3 options for specifying the predictors.

  1. supplying y and x (omitting E and tau) will use the columns of x as predictors. This allows for the most customization.

  2. supplying y, E, and tau (omitting x) will use E lags of y (with spacing tau) as predictors. This is equivalent to option 3 with x=y.

  3. supplying y, x, E, and tau will use E lags of each column of x (with spacing tau) as predictors. Do not use this option if x already contains lags, in that case use option 1.

Arguments pop and time are optional in all of the above cases. If pop and time are supplied, they should NOT contain missing values. This function will also, optionally, produce predictions. See Details for more information.

fitGP(
  data = NULL,
  y,
  x = NULL,
  pop = NULL,
  time = NULL,
  E = NULL,
  tau = NULL,
  scaling = c("global", "local", "none"),
  initpars = NULL,
  modeprior = 1,
  fixedpars = NULL,
  rhofixed = NULL,
  rhomatrix = NULL,
  augdata = NULL,
  predictmethod = NULL,
  newdata = NULL,
  xnew = NULL,
  popnew = NULL,
  timenew = NULL,
  ynew = NULL,
  returnGPgrad = FALSE,
  exclradius = 0,
  linprior = c("none", "local", "global")
)

Arguments

data

A data frame, or matrix with named columns.

y

The response variable (required). If data is supplied, a column name (character) or index (numeric). If data is not supplied, a numeric vector.

x

Predictor variables. If data is supplied, column names (character vector) or indices (numeric vector). If data is not supplied, a numeric matrix (or vector, if there is only one predictor variable). If x is not supplied, values for E and tau must be provided to construct it internally.

pop

Identifies separate populations (optional, if not supplied, defaults to 1 population). Population values can be either numeric, character, or factor. If data is supplied, a column name (character) or index (numeric). If data is not supplied, a vector (numeric, character, or factor).

time

A time index (optional, if not supplied, defaults to a numeric index). Important: The time index is not used for model fitting (timesteps are assumed to be evenly spaced) but supplying time will be add these values to the output table, which may be useful for later plotting purposes. If data is supplied, a column name (character) or index (numeric). If data is not supplied, a numeric vector.

E

Embedding dimension. If supplied, will be used to constuct lags of x (or lags of y if x is not supplied).

tau

Time delay. If supplied, will be used to constuct lags of x (or lags of y if x is not supplied).

scaling

How the variables should be scaled (see Details). Scaling can be "global" (default), "local" (only applicable if more than 1 pop), or "none". Scaling is applied to y and each x. For a mix of scalings, do it manually beforehand and set scaling to "none". All outputs are backtransformed to original scale.

initpars

Starting values for hyperparameters (see Details) in the order c(phi,ve,sigma2,rho), in constrained (non-transformed) space. Should be a numeric vector with length ncol(x)+3. Defaults used if not supplied: c(rep(0.1,ncol(x)), 0.001, 1-0.001, 0.5)

modeprior

This value is used by the phi prior and sets the expected number of modes over the unit interval. Defaults to 1.

fixedpars

Fixes values of the hyperparameters phi, ve, and sigma2 (if desired). Should be a numeric vector with length ncol(x)+2 in the order c(phi,ve,sigma2) in constrained (non-transformed) space. Hyperparameters to estimate should be input as NA. The value of rho is fixed separately using argument rhofixed.

rhofixed

Fixes the rho parameter, if desired (see Details).

rhomatrix

Symmetrical square matrix of fixed pairwise rho values to use, with 1's on the diagonal. The rows and columns should be named with the population identifier. The output of getrhomatrix can be used here. All populations in the dataset must appear in rhomatrix. Populations in rhomatrix that are not in the dataset are allowed and will not be used.

augdata

A data frame with augmentation data (see Details).

predictmethod

Using the training data, loo does leave-one-out prediction, lto does leave-timepoint-out prediction, and sequential does sequential (leave-future-out) prediction.

newdata

Data frame containing the same columns supplied in the original model.

xnew

New predictor matrix or vector. Not required if newdata is supplied.

popnew

New population vector. Not required if newdata is supplied.

timenew

New time vector. Not required if newdata is supplied.

ynew

New response vector. Optional, unless E and tau were supplied in lieu of x. Not required if newdata is supplied.

returnGPgrad

Return the gradient (derivative) of the GP model at each time point with respect to each input. This is only computed for out-of-sample predictions using newdata, loo, or lto. If you want the in-sample gradient, pass the original dataset as newdata. Defaults to FALSE.

exclradius

For predictmethod="loo" and predictmethod="lto", the number of points on either side of the focal point to leave out of the training data. Defaults to 0.

linprior

Fit GP model to the residuals of a linear relationship between y and the first variable of x (prior to scaling) and backtransforms as appropriate. If y is growth rate, acts like a Ricker prior. Defaults to "none". Option "local" fits a separate regression for each population, "global" fits a single regression.

Value

A list (class GP and GPpred) with the following elements:

pars

Posterior mode of hyperparameters (named vector).

parst

Posterior mode of hyperparameters (named vector), transformed to real line (used for optimization).

grad

Likelihood gradients of hyperparameters at posterior mode. Can be used to assess convergence.

covm

List containing various covariance matrices and the inverse covariance matrix.

iter

Number of iterations until convergence was reached. Currently fixed at a max of 200.

inputs

Inputs and scaled inputs. Note that if you use E and tau, the names of the predictors in the input data frame will be stored under x_names, and the names of the lagged predictors corresponding to the inverse length scales will be stored under x_names2, provided these names exist.

scaling

Scaling information.

linprior

If linprior is not "none", linear regression information. linprior_mod is an lm object.

insampresults

Data frame with in-sample predictions. predfsd is the standard deviation of the GP function, predsd includes process error.

insampfitstats

Fit statistics for in-sample predictions. Includes R2, rmse, ln_post (log posterior likelihood evaluated at the mode), lnL_LOO (generalized cross-validation approximate leave-one-out negative log likelihood), and df (estimated degrees of freedom, equal to the trace of the smoother matrix). lnL_LOO does not include the prior, so is not directly comparable to ln_post. For diagnostics, also includes likelihood components ln_prior, SS, logdet. If there are multiple populations, also includes R2centered and R2scaled.

insampfitstatspop

If >1 population, fit statistics (R2 and rmse) for in-sample predictions by population.

outsampresults

Data frame with out-of-sample predictions (if requested). predfsd is the standard deviation of the GP function, predsd includes process error.

outsampfitstats

Fit statistics for out-of-sample predictions (if requested). Only computed if using "loo" or "sequential", if y is found in newdata, or if ynew supplied (i.e. if the observed values are known).

outsampfitstatspop

If >1 population, fit statistics for out-of-sample predictions (if requested) by population.

GPgrad

If returnGPgrad=T, a data frame with the partial derivatives of the function with respect to each input.

call

Function call. Allows use of update.

Details

The data are assumed to be in time order. This is particularly important if E and tau are supplied or predictmethod="sequential" is used.

Missing values:

Missing values for y and x are allowed. Any rows containing missing values will be excluded from the model fitting procedure (reinserted as NAs in the output). If pop and time are supplied, they should NOT contain missing values.

Hyperparameters:

The model uses a squared exponential covariance function. See the "Extended introduction" vignette for mathematical details.

There is one inverse length scale phi estimated for each predictor variable (input dimension). Values near 0 indicate that the predictor has little influence on the response variable, and it was likely dropped by ARD. Higher values of phi indicate greater nonlinearity.

The estimated variance terms are ve (process variance) and sigma2 (pointwise prior variance).

Hyperparameter rho is the dynamic correlation in the hierarchical GP model, indicating similarity of dynamics across populations. If there is only 1 population (e.g. if pop is not supplied), rho is irrelevant (not used by the model) and will revert to the mode of its prior (~0.5). The value of rho can be fixed to any value from 0.0001 to 0.9999 using rhofixed, otherwise it is estimated from the data. Alternatively, a matrix of fixed pairwise rho values can be supplied using rhomatrix. In this case, the single rho parameter will also not be used and will revert to the mode of the prior (~0.5). A pairwise rho matrix can be generated using getrhomatrix, or you can create a custom one (e.g. based on geographical distance).

Scaling:

The model priors assume the response variable and all predictor variables have approximately zero mean and unit variance, therefore it is important to scale the data properly when fitting these models. For convenience, this function will scale the input data automatically by default, and backtransform output to the original scale. Automatic scaling can be "global" (default), or "local". The latter will scale variables within (as opposed to across) populations, so is obviously only applicable if there is more than 1 population. You can also scale the data yourself beforehand in whatever manner you wish and set scaling="none". In this case, you will obviously have to do the backtransformation yourself.

Predictions:

There are several options for producing predictions:

  1. Use predictmethod="loo" for leave-one-out prediction using the training data.

  2. Use predictmethod="lto" for leave-timepoint-out prediction using the training data. This will leave out values with the same time index across multiple populations, rather than each individual datapoint. If there is only one population, "lto" will be equivalent to "loo".

  3. Use predictmethod="sequential" for sequential (leave-future-out) prediction using the training data.

  4. If data frame data was supplied, supply data frame newdata containing same column names. Column for y is optional, unless E and tau were supplied in lieu of x.

  5. If vectors/matrices were supplied for y, x, etc, equivalent vector/matrices xnew, popnew (if pop was supplied), and timenew (optional). ynew is optional, unless E and tau were supplied in lieu of x.

After fitting, predictions can also be made using predict.GP. Predictions can plotted using plot.GPpred. Get conditional responses using getconditionals.

It should be noted that "loo" is not a "true" leave-one-out, for although each prediction is made by removing one of the training points, the hyperparameters are fit using all of the training data. The same goes for "sequential".

For the out-of-sample prediction methods "loo", "lto", and newdata, the partial derivatives of the fitted GP function at each time point with respect to each input can be obtained by setting returnGPgrad = T. If you want the in-sample gradient, pass the original (training) data as back in as newdata. It automatic scaling was used, the gradient will also be backtransformed to the original units.

Fit statistics:

The R2 and other fit statistics are always computed for y in the units in which it was supplied to the function. The R2 is specifically computed as: $$R^2=1-SS_{err}/SS_{y}$$ which is equivalent to $$R^2=1-MSE/Var_{y}$$ As a result, the returned R-squared may be negative, particularly for out-of-sample predictions, which are not guaranteed to pass through the mean. A negative R2 indicates that the model predictions are worse than just using the mean (MSE is larger than the variance).

If there are multiple populations, there will be a total R2, which is the R2 across populations, as well as within-population R2 values. For the total R2, the denominator is the variance across all datapoints, ignoring population. Note that if the populations have very different local means, the total R2 can be potentially misleading because the across-population variance will be much larger than the within population variance, increasing R2 even if MSE is constant. Put another way, simply getting the local means right can explain a lot of the across-population variance even if there is little prediction accuracy within a population. Very different local variances can cause similar issues. In this case, we would recommend looking at the within-population R2 values, or at one of the alternative R2 values provided in the fit stats: R2centered (total R2 with local means removed) or R2scaled (total R2 with local centering and scaling), which may be more accurate measures of performance than total R2 if the populations have very different local means and/or variances.

References

Munch, S. B., Poynor, V., and Arriaza, J. L. 2017. Circumventing structural uncertainty: a Bayesian perspective on nonlinear forecasting for ecology. Ecological Complexity, 32: 134.

Examples

yrand=rnorm(20)
testgp=fitGP(y=yrand,E=2,tau=1,predictmethod = "loo")