fitGP.Rd
Fits a GP model with separable length scales and automatic relevance determination (ARD).
Also fits a hierarchical version of the GP model if distinct populations are indicated
using pop
.
There are several ways to specify the training data:
A. supply data frame data
, plus column names or indices for y
, x
, pop
, and time
.
B. supply vectors for y
, pop
, and time
, and a matrix or vector for x
.
For each of the above 2 options, there are 3 options for specifying the predictors.
supplying y
and x
(omitting E
and tau
) will use the columns of x
as predictors. This allows
for the most customization.
supplying y
, E
, and tau
(omitting x
) will use E
lags of y
(with spacing tau
) as predictors.
This is equivalent to option 3 with x
=y
.
supplying y
, x
, E
, and tau
will use E
lags of each column of x
(with spacing tau
) as predictors.
Do not use this option if x
already contains lags, in that case use option 1.
Arguments pop
and time
are optional in all of the above cases.
If pop
and time
are supplied, they should NOT contain missing values.
This function will also, optionally, produce predictions.
See Details for more information.
fitGP(
data = NULL,
y,
x = NULL,
pop = NULL,
time = NULL,
E = NULL,
tau = NULL,
scaling = c("global", "local", "none"),
initpars = NULL,
modeprior = 1,
fixedpars = NULL,
rhofixed = NULL,
rhomatrix = NULL,
augdata = NULL,
predictmethod = NULL,
newdata = NULL,
xnew = NULL,
popnew = NULL,
timenew = NULL,
ynew = NULL,
returnGPgrad = FALSE,
exclradius = 0,
linprior = c("none", "local", "global")
)
A data frame, or matrix with named columns.
The response variable (required). If data
is supplied, a column name
(character) or index (numeric). If data
is not supplied, a numeric vector.
Predictor variables. If data
is supplied, column names
(character vector) or indices (numeric vector). If data
is not supplied, a numeric matrix
(or vector, if there is only one predictor variable). If x
is not supplied,
values for E
and tau
must be provided to construct it internally.
Identifies separate populations (optional, if not supplied, defaults to 1
population). Population values can be either numeric, character, or factor.
If data
is supplied, a column name (character) or index (numeric).
If data
is not supplied, a vector (numeric, character, or factor).
A time index (optional, if not supplied, defaults to a numeric index).
Important: The time index is not used for model fitting (timesteps are
assumed to be evenly spaced) but supplying time
will be add these values to the output table,
which may be useful for later plotting purposes. If data
is supplied, a column name
(character) or index (numeric). If data
is not supplied, a numeric vector.
Embedding dimension. If supplied, will be used to constuct lags of x
(or
lags of y
if x
is not supplied).
Time delay. If supplied, will be used to constuct lags of x
(or
lags of y
if x
is not supplied).
How the variables should be scaled (see Details). Scaling can be "global"
(default), "local"
(only applicable if more than 1 pop), or "none"
. Scaling is applied
to y
and each x
. For a mix of scalings, do it manually beforehand and
set scaling to "none"
. All outputs are backtransformed to original scale.
Starting values for hyperparameters (see Details) in the order
c(phi,ve,sigma2,rho)
, in constrained (non-transformed) space. Should be a numeric
vector with length ncol(x)+3
. Defaults used if not supplied: c(rep(0.1,ncol(x)), 0.001, 1-0.001, 0.5)
This value is used by the phi prior and sets the expected number of modes over the unit interval. Defaults to 1.
Fixes values of the hyperparameters phi, ve, and sigma2 (if desired). Should be a numeric
vector with length ncol(x)+2
in the order c(phi,ve,sigma2)
in constrained
(non-transformed) space. Hyperparameters to estimate should be input as NA. The value of rho is
fixed separately using argument rhofixed
.
Fixes the rho parameter, if desired (see Details).
Symmetrical square matrix of fixed pairwise rho values to use, with 1's on the diagonal.
The rows and columns should be named with the population identifier.
The output of getrhomatrix
can be used here.
All populations in the dataset must appear in rhomatrix
. Populations in
rhomatrix
that are not in the dataset are allowed and will not be used.
A data frame with augmentation data (see Details).
Using the training data, loo
does leave-one-out prediction, lto
does
leave-timepoint-out prediction, and sequential
does sequential (leave-future-out)
prediction.
Data frame containing the same columns supplied in the original model.
New predictor matrix or vector. Not required if newdata
is supplied.
New population vector. Not required if newdata
is supplied.
New time vector. Not required if newdata
is supplied.
New response vector. Optional, unless E
and tau
were supplied in
lieu of x
. Not required if newdata
is supplied.
Return the gradient (derivative) of the GP model at each time point with
respect to each input. This is only computed for out-of-sample predictions using newdata
,
loo
, or lto
. If you want the in-sample gradient, pass the original dataset as
newdata
. Defaults to FALSE.
For predictmethod="loo"
and predictmethod="lto"
, the number of points
on either side of the focal point to leave out of the training data. Defaults to 0.
Fit GP model to the residuals of a linear relationship
between y
and the first variable of x
(prior to scaling) and
backtransforms as appropriate. If y
is growth rate, acts like a
Ricker prior. Defaults to "none"
. Option "local"
fits a separate
regression for each population, "global"
fits a single regression.
A list (class GP and GPpred) with the following elements:
Posterior mode of hyperparameters (named vector).
Posterior mode of hyperparameters (named vector), transformed to real line (used for optimization).
Likelihood gradients of hyperparameters at posterior mode. Can be used to assess convergence.
List containing various covariance matrices and the inverse covariance matrix.
Number of iterations until convergence was reached. Currently fixed at a max of 200.
Inputs and scaled inputs. Note that if you use E
and tau
,
the names of the predictors in the input data frame will be stored under
x_names
, and the names of the lagged predictors corresponding to the
inverse length scales will be stored under x_names2
, provided these names exist.
Scaling information.
If linprior
is not "none"
, linear regression information.
linprior_mod
is an lm
object.
Data frame with in-sample predictions. predfsd
is the standard
deviation of the GP function, predsd
includes process error.
Fit statistics for in-sample predictions. Includes R2, rmse, ln_post (log posterior likelihood evaluated at the mode), lnL_LOO (generalized cross-validation approximate leave-one-out negative log likelihood), and df (estimated degrees of freedom, equal to the trace of the smoother matrix). lnL_LOO does not include the prior, so is not directly comparable to ln_post. For diagnostics, also includes likelihood components ln_prior, SS, logdet. If there are multiple populations, also includes R2centered and R2scaled.
If >1 population, fit statistics (R2 and rmse) for in-sample predictions by population.
Data frame with out-of-sample predictions (if requested). predfsd
is the standard
deviation of the GP function, predsd
includes process error.
Fit statistics for out-of-sample predictions (if requested).
Only computed if using "loo"
or "sequential"
, if y
is found in newdata
,
or if ynew
supplied (i.e. if the observed values are known).
If >1 population, fit statistics for out-of-sample predictions (if requested) by population.
If returnGPgrad=T
, a data frame with the partial derivatives of the
function with respect to each input.
Function call. Allows use of update
.
The data are assumed to be in time order. This is particularly important if E
and tau
are supplied or predictmethod="sequential"
is used.
Missing values:
Missing values for y
and x
are allowed. Any rows containing
missing values will be excluded from the model fitting procedure (reinserted
as NAs in the output). If pop
and time
are supplied, they
should NOT contain missing values.
Hyperparameters:
The model uses a squared exponential covariance function. See the "Extended introduction" vignette for mathematical details.
There is one inverse length scale phi
estimated for each predictor
variable (input dimension). Values near 0 indicate that the predictor has
little influence on the response variable, and it was likely dropped by ARD.
Higher values of phi
indicate greater nonlinearity.
The estimated variance terms are ve
(process variance) and
sigma2
(pointwise prior variance).
Hyperparameter rho
is the dynamic correlation in the hierarchical GP
model, indicating similarity of dynamics across populations. If there is only
1 population (e.g. if pop
is not supplied), rho
is irrelevant
(not used by the model) and will revert to the mode of its prior (~0.5). The
value of rho
can be fixed to any value from 0.0001 to 0.9999 using
rhofixed
, otherwise it is estimated from the data. Alternatively, a matrix
of fixed pairwise rho values can be supplied using rhomatrix
. In this case,
the single rho parameter will also not be used and will revert to the mode of the
prior (~0.5). A pairwise rho matrix can be generated using getrhomatrix
,
or you can create a custom one (e.g. based on geographical distance).
Scaling:
The model priors assume the response variable and all predictor variables
have approximately zero mean and unit variance, therefore it is important to
scale the data properly when fitting these models. For convenience, this
function will scale the input data automatically by default, and
backtransform output to the original scale. Automatic scaling can be
"global"
(default), or "local"
. The latter will scale variables
within (as opposed to across) populations, so is obviously only applicable if
there is more than 1 population. You can also scale the data yourself
beforehand in whatever manner you wish and set scaling="none"
. In this
case, you will obviously have to do the backtransformation yourself.
Predictions:
There are several options for producing predictions:
Use predictmethod="loo"
for leave-one-out prediction using the training data.
Use predictmethod="lto"
for leave-timepoint-out prediction using the training data. This will leave
out values with the same time index across multiple populations, rather than each individual datapoint.
If there is only one population, "lto"
will be equivalent to "loo"
.
Use predictmethod="sequential"
for sequential (leave-future-out) prediction using the training data.
If data frame data
was supplied, supply data frame newdata
containing same column names.
Column for y
is optional, unless E
and tau
were supplied in lieu of x
.
If vectors/matrices were supplied for y
, x
, etc, equivalent vector/matrices xnew
,
popnew
(if pop
was supplied), and timenew
(optional).
ynew
is optional, unless E
and tau
were supplied in lieu of x
.
After fitting, predictions can also be made using predict.GP
.
Predictions can plotted using plot.GPpred
.
Get conditional responses using getconditionals
.
It should be noted that "loo"
is not a "true" leave-one-out, for although each prediction is
made by removing one of the training points, the hyperparameters are fit using all of the training data.
The same goes for "sequential"
.
For the out-of-sample prediction methods "loo"
, "lto"
, and
newdata
, the partial derivatives of the fitted GP function at each
time point with respect to each input can be obtained by setting
returnGPgrad = T
. If you want the in-sample gradient, pass the
original (training) data as back in as newdata
. It automatic scaling
was used, the gradient will also be backtransformed to the original units.
Fit statistics:
The R2 and other fit statistics are always computed for y
in the units in which it was
supplied to the function. The R2 is specifically computed as:
$$R^2=1-SS_{err}/SS_{y}$$
which is equivalent to
$$R^2=1-MSE/Var_{y}$$
As a result, the returned R-squared may be negative, particularly for out-of-sample
predictions, which are not guaranteed to pass through the mean. A negative R2 indicates
that the model predictions are worse than just using the mean (MSE is larger than the variance).
If there are multiple populations, there will be a total R2, which is the R2 across populations, as well as within-population R2 values. For the total R2, the denominator is the variance across all datapoints, ignoring population. Note that if the populations have very different local means, the total R2 can be potentially misleading because the across-population variance will be much larger than the within population variance, increasing R2 even if MSE is constant. Put another way, simply getting the local means right can explain a lot of the across-population variance even if there is little prediction accuracy within a population. Very different local variances can cause similar issues. In this case, we would recommend looking at the within-population R2 values, or at one of the alternative R2 values provided in the fit stats: R2centered (total R2 with local means removed) or R2scaled (total R2 with local centering and scaling), which may be more accurate measures of performance than total R2 if the populations have very different local means and/or variances.
Munch, S. B., Poynor, V., and Arriaza, J. L. 2017. Circumventing structural uncertainty: a Bayesian perspective on nonlinear forecasting for ecology. Ecological Complexity, 32: 134.
yrand=rnorm(20)
testgp=fitGP(y=yrand,E=2,tau=1,predictmethod = "loo")