--- title: "Getting started with cxr" output: rmarkdown::html_vignette author: David Garcia-Callejas and cxr team vignette: > %\VignetteIndexEntry{Getting started} \usepackage[utf8]{inputenc} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup,echo=FALSE} knitr::opts_chunk$set(message = FALSE, warning = FALSE) ``` **Introduction** The `cxr` package provides a general interface to obtain estimates of species vital rates and interaction coefficients between species pairs from empirical data. These estimations are critical to parameterize population models describing the dynamics of interacting species. They also allow computing a series of metrics associated with modern coexistence theory that inform about the likelihood of species to coexist. These metrics are 1) *niche differences* that stabilize coexistence between competing species and 2) *average fitness differences* that drive competitive dominance and, in the absence of niche differences, determine the superior competitor. The package also allows exploring how environmental variation modifies both the intrinsic ability of species to produce offspring and the interaction coefficients between pairs of species (including with itself). This feature opens the possibility of exploring how stabilizing niche differences and average fitness differences vary across environmental gradients, and therefore, it allows analyzing whether the likelihood of species to coexist changes across environmental conditions. Here we demonstrate the basic functionality of the package using a published observational dataset (see Lanuza et al. (2018) and vignette 2 (Data formats) for a description of the dataset). With this example, we will specifically estimate seed production in the absence of neighbors (lambda) and the strength and sign of species interactions between species pairs (alpha matrix). These values are the basis for estimating the degree of niche overlap (i.e 1- niche differences) and average fitness differences between species pairs, which are covered in vignette 3 (Coexistence metrics). Finally, these estimations of lambda and alpha parameters are the basis for analyzing more complex dynamics such as the stability of the dynamics of multispecies communities (see Godoy et al. 2017) and multitrophic coexistence (see Godoy et al. 2018). **Fitting a single species** First, we load the package and the associated data. The included dataset contains, for each individual, its reproductive success and the number of neighbors per species in a 7.5 cm buffer (see vignette 2). ```{r} library(cxr) data("neigh_list") ``` First, we draw the values of a single focal species. ```{r} my.sp <- "HOMA" # get data from the list obs_homa <- neigh_list[[my.sp]] # no need for ID column obs_homa <- subset(obs_homa,select = -c(obs_ID)) # For each observation, we need the individual plant fitness # and the number of neighbours per species (in columns). head(obs_homa) #There are a couple of outliers, eliminate from analyses obs_homa<-subset(obs_homa, fitness<500) ``` Next, we estimate both the reproduction success in the absence of neighbors (lambda) and the competition matrix (alpha). This is done by fitting a model that mathematically relates the reproductive success to the number of neighbors observed. In this first example, we fit the selected species with a Ricker model ('RK' model family, see vignette 4) and fairly standard initial values. The default optimization method (`Nelder-Mead`) does not allow for lower or upper bounds in model parameters, so these arguments are commented out. In ecological terms, this optimization process allows estimating the strength of both competitive and facilitative interactions, yet bounded optimization algorithms can be used to restrict the analysis to either competition or facilitation (i.e. positive or negative alpha values). We can also specify whether we want to compute standard errors numerically, by setting the argument `bootstrap_samples` to the number of bootstrap replications for the calculation. ```{r} #?cxr_pm_fit #check the help file for a description of the arguments fit_homa <- cxr_pm_fit(data = obs_homa, focal_column = my.sp, model_family = "BH", covariates = NULL, optimization_method = "Nelder-Mead", alpha_form = "pairwise", lambda_cov_form = "none", alpha_cov_form = "none", initial_values = list(lambda = 45, alpha_intra = .1, alpha_inter = .1), #not aplicable to this optimazation method # lower_bounds = list(lambda = 0, # alpha_intra = 0, # alpha_inter = 0), # upper_bounds = list(lambda = 10, # alpha_intra = 1, # alpha_inter = 1), fixed_terms = NULL, # a low number of bootstrap samples # for demonstration purposes, # increase it for robust results. bootstrap_samples = 3) ``` For a quick summary of the fit, we can run a summary on the resulting object. ```{r} summary(fit_homa) ``` This object is actually a list with several elements. We can thus access these elements as usual: ```{r} names(fit_homa) #list of all available elements. #reproduction success in the absence of neighbors fit_homa$lambda # intraspecific interaction fit_homa$alpha_intra # interspecific interactions fit_homa$alpha_inter ``` Note that some interaction coefficients are set to NA because species do not co-occur but are nevertheless listed as neighbours with densities equal to zero in all focal observations. To check whether model fit reproduces well observed data, represent the fit according to obtained values. ```{r} require(ggplot2) ggplot(obs_homa, aes(HOMA , fitness)) + geom_point() + stat_function(fun = function(x) fit_homa$lambda/(1+fit_homa$alpha_intra*x), lwd = 1.5, colour = "blue") ``` **Fitting several species at once** Most likely users will want to fit model parameters to data from two or more focal species. In order to do that with a single call, we provide the function `cxr_pm_multifit`, which has a very similar interface to `cxr_pm_fit`. Here we show how multiple species can be fit using this function. For this multispecies case, rows in the alpha element of the returning list correspond to species i and columns to species j for each $\alpha_{ij}$ coefficient. The diagonal corresponds to intraspecific coefficients. In order to showcase other capabilities of the package, we include in this example the effect of a covariate over the fitted lambda and alpha parameters. This covariate, soil salinity, is also included as a dataset in the package (see vignette 2). We consider that the covariate has a linear effect in both the modification of lambda and alpha parameters. ```{r} my.sp <- c("BEMA","CETE","LEMA") obs_3sp <- neigh_list[my.sp] # discard ID column for(i in 1:length(obs_3sp)){ obs_3sp[[i]] <- obs_3sp[[i]][,2:length(obs_3sp[[i]])] } # load covariates: salinity data("salinity_list") salinity <- salinity_list[my.sp] # keep only salinity column for(i in 1:length(salinity)){ salinity[[i]] <- as.matrix(salinity[[i]][,2:length(salinity[[i]])]) colnames(salinity[[i]]) <- "salinity" } ``` Note how the data is passed in a list with as many elements as focal species. Each element is a dataframe with observations of the corresponding focal species. Same for the covariates data, it must be a list with as many elements as focal species. Each element is a dataframe (or a matrix) with a column for each covariate (one, in this case) and the same number of observations as its associated species data. ```{r} names(obs_3sp) # observation data head(obs_3sp[[1]]) # number of fitness observations nrow(obs_3sp[[1]]) # salinity data head(salinity[[1]]) # number of covariate observations nrow(salinity[[1]]) ``` We fit the model as above, but using the `cxr_pm_multifit` function. To allow proper convergence we fit each species with different starting lambda values ```{r} fit_3sp <- list() all_sp <- names(obs_3sp) for (i in 1:length(obs_3sp)){ obs <- obs_3sp[i] my.sp <- all_sp[i] salinity.sp <- salinity[i] lambda <- mean(obs[[1]]$fitness) upper_lambda <- as.numeric(max(obs[[1]]$fitness)) fit <- cxr_pm_multifit(data = obs, focal_column = my.sp, model_family = "BH", # here we use a bounded method for demonstration purposes optimization_method = "bobyqa", covariates = salinity.sp, alpha_form = "pairwise", lambda_cov_form = "global", # effect of covariates over lambda alpha_cov_form = "global", # effect of covariates over alpha initial_values = list(lambda = lambda, alpha_intra = 1, alpha_inter = 1, lambda_cov = 0.1, alpha_cov = 0.1), lower_bounds = list(lambda = 0, alpha_intra = -1, alpha_inter = -1, lambda_cov = -2, alpha_cov = -2), upper_bounds = list(lambda = upper_lambda, alpha_intra = 3, alpha_inter = 3, lambda_cov = 2, alpha_cov = 2), # no standard errors bootstrap_samples = 0) fit_3sp[[i]] <- fit } names(fit_3sp) <- all_sp ``` We can also have a glimpse of this multispecies fit for each species separately. This is an example for the first species. ```{r} fit_3sp$BEMA ``` The numerical estimation of parameters depends on the model with which to estimate fitness values, the optimization method, and the underlying data. In our example dataset, some species are better represented than others, and the function will raise warnings if the estimation can be improved or, for example, if any fitted parameter is equal to the lower or upper bounds provided. The `cxr_pm_multifit` function will behave conservatively and return `NULL` if parameter fitting fails for *any* of the focal species passed as arguments, printing an informative message about which species failed to fit. In such cases, users may either fit each species separately or call `cxr_pm_multifit` without the problematic species. Importantly, bounded methods can be very sensitive to the initial values and bounds, so, as in any numerical optimization problem, you should double-check the values obtained, at least by computing standard errors on your parameters with an adequate number of boostrap samples, and preferably by performing sensitivity analyses on the different parameters (in this and other vignettes, we have not included any of these checks, as our examples are merely for demonstration purposes). Aside from these recommendations, in the `cxr` objects returned from our functions, the negative log-likelihood of the fit is also included, which may be useful in helping users choose a certain optimization algorithm for a particular dataset. Remember that the convention is to present negative log-likelihood, and more negative values are better. In general, it is recommended to test different optimization algorithms, as they may produce fairly different results (Mullen 2014). In this example, the method used ("bobyqa", a well-established bounded optimization algorithm) returns the following *negative* log-likelihood values: ```{r} fit_3sp$BEMA$log_likelihood ``` **Including environmental variability** In the above example for fitting multiple species at once, we have already offered a glimpse on how to include the effect of environmental covariates over lambda and alpha values. The relevant arguments in `cxr_pm_fit` and `cxr_pm_multifit` are 'lambda_cov_form' and 'alpha_cov_form'. If these are set to 'none', no effect of covariates is considered. Otherwise, there are a couple options to model this effect (all the options consider linear effects, for now). First, the effect of covariates over lambda can be 'global', meaning that each covariate has a global parameter affecting lambda. This is formulated as $\lambda (1 + \sum_{k=1}^{s} \theta_k c_k)$ where $s$ is the number of environmental covariates, $c_k$ is the observed value of the i-th covariate and $\theta_k$ is the 'lambda_cov' parameter of the `cxr` functions. The effect over alpha values can likewise be 'global', so that, for focal species $i$, each covariate affects the alpha values $\alpha_{ij}$ equally through a global parameter: $\alpha_* + \sum_{k=1}^{s} \psi_k c_k$ where $*$ represents the set of all pairwise interaction for a given focal species, and $\psi_k$ is the global parameter relating covariate $k$ to the alpha values. The last possibility included in `cxr` is for the effect of covariates over alpha values to be interaction-specific, which is coded by specifying 'alpha_cov_form' as 'pairwise': $\alpha_{ij} + \sum_{k=1}^{s} \psi_{ijk} c_k$ In this case, each covariate $c_k$ will have a specific parameter $\psi_{ijk}$ for its effect over each interaction $\alpha_{ij}$. We can retrieve the fitted values for 'lambda_cov' ($\theta$) and 'alpha_cov' ($\psi$) simply from the output of the function: ```{r} fit_3sp$BEMA$lambda_cov fit_3sp$BEMA$alpha_cov ``` **Parameter boundaries** `cxr` includes the possibility of providing boundaries for your parameters, so that the fitted values fall inside a certain interval. This is done via the arguments `lower_bounds` and `upper_bounds` of the fitting functions. These arguments accept a list whose elements must be named and correspond to one or more of the function parameters, i.e. `lambda`, `alpha_intra`, `alpha_inter`, `lambda_cov`, and `alpha_cov`. If these boundaries are of length 1, as in this example: ```{r} lower_bounds = list(lambda = 0, alpha_intra = 0, alpha_inter = -1, lambda_cov = 0, alpha_cov = 0) upper_bounds = list(lambda = 100, alpha_intra = 1, alpha_inter = 1, lambda_cov = 1, alpha_cov = 1) ``` the same boundaries will be used for all values in the appropriate set, e.g. each interspecific alpha term (`alpha_inter`) will have the same boundaries. Otherwise, if you are interested in fitting parameters for more than one covariate, you may want to specify varying boundaries for `lambda_cov` and `alpha_cov` depending on the different covariates. In that case, the functions accept, for these elements, vectors with as many values as covariates. For example, if you have two covariates for which you want to calculate associated `lambda_cov` and `alpha_cov` values, but these covariates have very different magnitudes, you may provide two `lambda_cov` and `alpha_cov` boundaries. ```{r} # fit three species, as in the previous example data <- neigh_list[1:3] # keep only fitness and neighbours columns for(i in 1:length(data)){ data[[i]] <- data[[i]][,2:length(data[[i]])] } focal_column <- names(data) # covariates # in this example, we use salinity # and two more covariates, randomly generated data("salinity_list") # observations for the first three species cov_list <- salinity_list[1:3] for(i in 1:length(cov_list)){ # keep only salinity column cov_list[[i]] <- as.matrix(cov_list[[i]][,2:length(cov_list[[i]])]) # add two random covariates cov_list[[i]] <- cbind(cov_list[[i]], runif(nrow(cov_list[[i]]),1,10), runif(nrow(cov_list[[i]]),10,100)) colnames(cov_list[[i]]) <- c("salinity","cov2","cov3") } # this is how each element of the covariates list looks like head(cov_list[[1]]) # function parameters model_family <- "BH" covariates <- cov_list # bobyqa is generally more robust than other bounded methods optimization_method <- "bobyqa" alpha_form <- "pairwise" lambda_cov_form <- "global" alpha_cov_form <- "pairwise" # note how lambda_cov and alpha_cov # have different initial values for each covariate effect # the commented assignations are also possible, # giving equal initial values to all parameters initial_values = list(lambda = 1, alpha_intra = 0.1, alpha_inter = 0.1, lambda_cov = c(0.1,0.2,0.1), alpha_cov = c(0.1,0.2,0.1)) # lambda_cov = c(0.1), # alpha_cov = c(0.1)) # same with boundaries lower_bounds = list(lambda = 0, alpha_intra = 0, alpha_inter = -1, lambda_cov = c(-1,0,-1), alpha_cov = c(-1,0,-1)) # lambda_cov = c(-1), # alpha_cov = c(-1)) upper_bounds = list(lambda = 100, alpha_intra = 1, alpha_inter = 1, lambda_cov = c(1,2,1), alpha_cov = c(1,2,1)) # lambda_cov = c(1), # alpha_cov = c(1)) fixed_terms <- NULL bootstrap_samples <- 3 ``` ```{r eval=FALSE} # this fit is fairly complex, it may take a while, # it also raises warnings, suggesting that either the data, # model, or initial values/boundaries can be improved. # This is consistent with having observational data # and, furthermore, random covariates fit_multi_cov <- cxr_pm_multifit(data = data, focal_column = focal_column, model_family = model_family, covariates = covariates, optimization_method = optimization_method, alpha_form = alpha_form, lambda_cov_form = lambda_cov_form, alpha_cov_form = alpha_cov_form, initial_values = initial_values, lower_bounds = lower_bounds, upper_bounds = upper_bounds, fixed_terms = fixed_terms, bootstrap_samples = bootstrap_samples) ``` **Keeping parameters fixed** There is the option to provide fixed values for model parameters, using the `fixed_terms` argument in the functions. As an example, if you want to keep the `lambda` parameter fixed, for a single species fit, you would modify the function arguments accordingly: ```{r} fixed_terms <- list(lambda = 1) # now lambda does not appear in 'initial_values' initial_values <- list(alpha_intra = 0, alpha_inter = 0, lambda_cov = 0, alpha_cov = 0) # lower and upper bounds should, likewise, # not contain lambda ``` This is extended to the `cxr_pm_multifit` function, by having `fixed_terms` be a list with as many elements as focal species. For example, having three focal species, where we want to keep `lambda` fixed: ```{r} fixed_terms <- list(list(lambda = 1), # focal sp 1 list(lambda = 1.2), # focal sp 2 list(lambda = 1.3)) # focal sp 3 ``` In this release of `cxr`, you must specify the same initial values for every focal species when using the `cxr_pm_multifit` option. This means, in practice, that if you want to fit species with different sets of fixed parameters, (e.g. only `lambda` for one species, but `lambda` and `alpha_intra` for another), or with different initial values, you should fit them separately, using `cxr_pm_fit`. **References** Godoy, O., Stouffer, D. B., Kraft, N. J., & Levine, J. M. (2017). Intransitivity is infrequent and fails to promote annual plant coexistence without pairwise niche differences. Ecology, 98(5), 1193-1200. Lanuza, J. B., Bartomeus, I., & Godoy, O. (2018). Opposing effects of floral visitors and soil conditions on the determinants of competitive outcomes maintain species diversity in heterogeneous landscapes. Ecology letters, 21(6), 865-874. Godoy, O., Bartomeus, I., Rohr, R. P., & Saavedra, S. (2018). Towards the integration of niche and network theories. Trends in ecology & evolution, 33(4), 287-300. Mullen, K. M. (2014). Continuous global optimization in R. Journal of Statistical Software, 60(6), 1-45.