4 LandR Biomass_speciesData Module

This documentation is work in progress. Please report any discrepancies or omissions at https://github.com/PredictiveEcology/Biomass_speciesData/issues.

4.0.0.1 Authors:

Eliot J B McIntire [aut, cre], Alex M. Chubaty [aut], Ceres Barros [aut]

4.1 Module Overview

4.1.1 Module summary

This module downloads and pre-process species % cover data layers to be passed to other LandR data modules (e.g., Biomass_borealDataPrep) or to the LandR forest simulation module Biomass_core.

4.1.2 Module inputs and parameters at a glance

Below are the full list of input objects (Table 4.1) and parameters (Table 4.2) that Biomass_speciesData expects. Of these, the only input that must be provided (i.e., Biomass_speciesData does not have a default for) is studyAreaLarge.

Raw data layers downloaded by the module are saved in dataPath(sim), which can be controlled via options(reproducible.destinationPath = ...).

Table 4.1: List of Biomass_speciesData input objects and their description.
objectName desc
rasterToMatchLarge a raster of studyAreaLarge in the same resolution and projection the simulation’s. Defaults to the using the Canadian Forestry Service, National Forest Inventory, kNN-derived stand biomass map.
rawBiomassMap total biomass raster layer in study area. Only used to create rasterToMatchLarge if necessary. Defaults to the Canadian Forestry Service, National Forest Inventory, kNN-derived total aboveground biomass map from 2001 (in tonnes/ha), unless ‘dataYear’ != 2001. See https://open.canada.ca/data/en/dataset/ec9e2659-1c29-4ddb-87a2-6aced147a990 for metadata.
sppColorVect A named vector of colors to use for plotting. The names must be in sim\(sppEquiv[[sim\)sppEquivCol]], and should also contain a color for ‘Mixed’
sppEquiv table of species equivalencies. See LandR::sppEquivalencies_CA.
sppNameVector an optional vector of species names to be pulled from sppEquiv. Species names must match P(sim)$sppEquivCol column in sppEquiv. If not provided, then species will be taken from the entire P(sim)$sppEquivCol column in sppEquiv. See LandR::sppEquivalencies_CA.
studyAreaLarge Polygon to use as the parametrisation study area. Must be provided by the user. Note that studyAreaLarge is only used for parameter estimation, and can be larger than the actual study area used for LandR simulations (e.g, larger than studyArea in LandR Biomass_core).
studyAreaReporting multipolygon (typically smaller/unbuffered than studyAreaLarge and studyArea in LandR Biomass_core) to use for plotting/reporting. If not provided, will default to studyAreaLarge.
Table 4.2: List of Biomass_speciesData parameters and their description.
paramName paramDesc
coverThresh The minimum % cover a species needs to have (per pixel) in the study area to be considered present
dataYear Passed to paste0('prepSpeciesLayers_', types) function to fetch data from that year (if applicable). Defaults to 2001 as the default kNN year.
sppEquivCol The column in sim$sppEquiv data.table to group species by and use as a naming convention. If different species in, e.g., the kNN data have the same name in the chosen column, their data are merged into one species by summing their % cover in each raster cell.
types The possible data sources. These must correspond to a function named paste0(‘prepSpeciesLayers_’, types). Defaults to ‘KNN’ to get the Canadian Forestry Service, National Forest Inventory, kNN-derived species cover maps from year ‘dataYear’, using the LandR::prepSpeciesLayers_KNN function (see https://open.canada.ca/ data/en/dataset/ec9e2659-1c29-4ddb-87a2-6aced147a990 for details on these data). Other currently available options are ‘ONFRI’, ‘CASFRI’, ‘Pickell’ and ‘ForestInventory’, which attempt to get proprietary data - the user must be granted access first. A custom function can be used to retrieve any data, just as long as it is accessible by the module (e.g., in the global environment) and is named as paste0(‘prepSpeciesLayers_’, types).
vegLeadingProportion a number that defines whether a species is leading for a given pixel. Only used for plotting.
.plotInitialTime This describes the simulation time at which the first plot event should occur
.plotInterval This describes the simulation time interval between plot events
.plots Passed to types in Plots (see ?Plots). There are a few plots that are made within this module, if set. Note that plots (or their data) saving will ONLY occur at end(sim). If NA, plotting is turned off completely (this includes plot saving).
.saveInitialTime This describes the simulation time at which the first save event should occur
.saveInterval This describes the simulation time interval between save events
.sslVerify Passed to httr::config(ssl_verifypeer = P(sim)$sslVerify) when downloading KNN (NFI) datasets. Set to 0L if necessary to bypass checking the SSL certificate (this may be necessary when NFI’s website SSL certificate is not correctly configured).
.studyAreaName Human-readable name for the study area used. If NA, a hash of studyAreaLarge will be used.
.useCache Controls cache; caches the init event by default
.useParallel Used in reading csv file with fread. Will be passed to data.table::setDTthreads.

4.1.3 Events

Biomass_speciesData only runs two events:

  • Module “initiation” (init event), during which all species % cover layers are downloaded and processed.
  • Plotting of the processed species cover layers (initPlot event).

4.1.4 Module outputs

The module produces the following outputs (Table 4.3):

Table 4.3: List of Biomass_speciesData output objects and their description.
objectName desc
speciesLayers biomass percentage raster layers by species in Canada species map
treed Table with one logical column for each species, indicating whether there were non-zero cover values in each pixel.
numTreed a named vector with number of pixels with non-zero cover values for each species
nonZeroCover A single value indicating how many pixels have non-zero cover

and automatically saves the processed species cover layers in the output path defined in getPaths(sim)$outputPath.

4.2 Module manual

4.2.1 Detailed description

This module accesses and processes species percent cover (% cover) data for the parametrisation and initialization of LandR Biomass_core. This module ensures 1) all data use the same geospatial geometries and 2) that these are correctly re-projected to studyAreaLarge, and 3) attempts to sequentially fill-in and replace the lowest quality data with higher quality data when several data sources are used. It’s primary output is a RasterStack of species % cover, with each layer corresponding to a species.

Currently, the module can access the Canadian Forest Inventory forest attributes kNN dataset [the default; Beaudoin et al. (2017)], the Common Attribute Schema for Forest Resource Inventories [CASFRI; Cosco (2011)] dataset, the Ontario Forest Resource Inventory (ONFRI), a dataset specific to Alberta compiled by Paul Pickell, and other Alberta forest inventory datasets. However, only the NFI kNN data are freely available – access to the other datasets must be granted by module developers and data owners, and a Google account is required. Nevertheless, the module is flexible enough that any user can use it to process additional datasets, provided that an adequate R function is passed to the module (see types parameter details in Parameters)

When multiple data sources are used, the module will use replace lower quality data with higher quality data following the order specified by the parameter types (see Parameters).

When multiple species of a given data source are to be grouped, % cover is summed across species of the same group within each pixel. Please see the sppEquiv input in Input objects for information on how species groups are defined.

The module can also exclude species % cover layers if they don’t have a minimum % cover value in at least one pixel. This means that the user should still inspect in how many pixels the species is deemed present, as it is possible that some data have only a few pixels with high % cover for a given species. In this case, the user may choose to exclude these species a posteriori. The summary plot automatically shown by Biomass_speciesData can help diagnose whether certain species are present in very few pixels (see Fig. 4.1).

4.2.2 Initialization, inputs and parameters

Biomass_speciesData initializes itself and prepares all inputs provided that it has internet access to download the raw data layers (or that these layers have been previously downloaded and stored in the folder specified by options("reproducible.destinationPath")).

The module defaults to processing cover data fo all species listed in the Boreal column of the default sppEquiv input data.table object, for which there are available % cover layers in the kNN dataset (Table 4.4; see ?LandR::sppEquivalencies_CA for more information):

Table 4.4: List of species cover data downloaded by default by Biomass_speciesData.
Species Generic name
Abies balsamea Balsam Fir
Abies lasiocarpa Fir
Acer negundo Boxelder maple
Acer pensylvanicum Striped maple
Acer saccharinum Silver maple
Acer saccharum Sugar maple
Acer spicatum Mountain maple
Acer spp. Maple
Alnus spp Alder
Betula alleghaniensis Swamp birch
Betula papyrifera Paper birch
Betula populifolia Gray birch
Betula spp. Birch
Fagus grandifolia American beech
Fraxinus americana American ash
Fraxinus nigra Black ash
Fraxinus spp. Ash
Larix laricina Tamarack
Larix lyallii Alpine larch
Larix occidentalis Western larch
Larix spp. Larch
Picea engelmannii x glauca Engelmann’s spruce
Picea engelmannii x glauca Engelmann’s spruce
Picea engelmannii Engelmann’s spruce
Picea glauca White.Spruce
Picea mariana Black.Spruce
Picea spp. Spruce
Pinus albicaulis Whitebark pine
Pinus banksiana Jack pine
Pinus contorta Lodgepole pine
Pinus monticola Western white pine
Pinus resinosa Red pine
Pinus spp. Pine
Populus balsamifera Balsam poplar
Populus balsamifera v. balsamifera Balsam poplar
Populus trichocarpa Black cottonwood
Populus grandidentata White poplar
Populus spp. Poplar
Populus tremuloides Trembling poplar
Tsuga canadensis Eastern hemlock
Tsuga spp. Hemlock

4.2.2.1 Input objects

Biomass_speciesData requires the following input data layers

Table 4.5: List of Biomass_speciesData input objects and their description.
objectName objectClass desc sourceURL
rasterToMatchLarge RasterLayer a raster of studyAreaLarge in the same resolution and projection the simulation’s. Defaults to the using the Canadian Forestry Service, National Forest Inventory, kNN-derived stand biomass map.
rawBiomassMap RasterLayer total biomass raster layer in study area. Only used to create rasterToMatchLarge if necessary. Defaults to the Canadian Forestry Service, National Forest Inventory, kNN-derived total aboveground biomass map from 2001 (in tonnes/ha), unless ‘dataYear’ != 2001. See https://open.canada.ca/data/en/dataset/ec9e2659-1c29-4ddb-87a2-6aced147a990 for metadata.
sppColorVect character A named vector of colors to use for plotting. The names must be in sim\(sppEquiv[[sim\)sppEquivCol]], and should also contain a color for ‘Mixed’ NA
sppEquiv data.table table of species equivalencies. See LandR::sppEquivalencies_CA.
sppNameVector character an optional vector of species names to be pulled from sppEquiv. Species names must match P(sim)$sppEquivCol column in sppEquiv. If not provided, then species will be taken from the entire P(sim)$sppEquivCol column in sppEquiv. See LandR::sppEquivalencies_CA. NA
studyAreaLarge SpatialPolygonsDataFrame Polygon to use as the parametrisation study area. Must be provided by the user. Note that studyAreaLarge is only used for parameter estimation, and can be larger than the actual study area used for LandR simulations (e.g, larger than studyArea in LandR Biomass_core). NA
studyAreaReporting SpatialPolygonsDataFrame multipolygon (typically smaller/unbuffered than studyAreaLarge and studyArea in LandR Biomass_core) to use for plotting/reporting. If not provided, will default to studyAreaLarge. NA

Of the inputs in Table 4.5, the following are particularly important and deserve special attention:

  • studyAreaLarge – the polygon defining the area for which species cover data area desired. It can be larger (but never smaller) that the study area used in the simulation of forest dynamics (i.e., studyArea object in Biomass_core).

  • sppEquiv – a table of correspondences between different species naming conventions. This table is used across several LandR modules, including Biomass_core. It is particularly important here because it will determine whether and how species (and their cover layers) are merged, if this is desired by the user. For instance, if the user wishes to simulate a generic Picea spp. that includes, Picea glauca, Picea mariana and Picea engelmannii, they will need to provide these three species names in the data column (e.g., KNN if obtaining forest attribute kNN data layers from the Canadian Forest Inventory), but the same name (e.g., “Pice_Spp”) in the coumn chosen for the naming convention used throughout the simulation (the sppEquivCol parameter); see Table 4.6 for an example).

Table 4.6: Example of species merging for simulation. Here the user wants to model Abies balsamea, A. lasiocarpa and Pinus contorta as separate species, but all Picea spp. as a genus-level group. For this, all six species are identified in the 'KNN' column, so that their % cover layers can be obtained, but in the 'Boreal' column (which defines the naming convention used in the simulation in this example) all Picea spp. have the same name. Biomass_speciesData will merge their % cover data into a single layer by summing their cover per pixel.
Species KNN Boreal Modelled as
Abies balsamea Abie_Bal Abie_Bal Abies balsamea
Abies lasiocarpa Abie_Las Abie_Las Abies lasiocarpa
Picea engelmannii x glauca Pice_Eng_Gla Pice_Spp Picea spp.
Picea engelmannii x glauca Pice_Eng_Gla Pice_Spp Picea spp.
Picea engelmannii Pice_Eng Pice_Spp Picea spp.
Picea glauca Pice_Gla Pice_Spp Picea spp.
Picea mariana Pice_Mar Pice_Spp Picea spp.
Pinus contorta Pinu_Con Pinu_Con Pinus contorta

4.2.2.2 Parameters

Table 4.7 lists all parameters used in Biomass_speciesData and their detailed information.

Table 4.7: List of Biomass_speciesData parameters and their description.
paramName paramClass default min max paramDesc
coverThresh integer 10 NA NA The minimum % cover a species needs to have (per pixel) in the study area to be considered present
dataYear numeric 2001 NA NA Passed to paste0('prepSpeciesLayers_', types) function to fetch data from that year (if applicable). Defaults to 2001 as the default kNN year.
sppEquivCol character Boreal NA NA The column in sim$sppEquiv data.table to group species by and use as a naming convention. If different species in, e.g., the kNN data have the same name in the chosen column, their data are merged into one species by summing their % cover in each raster cell.
types character KNN NA NA The possible data sources. These must correspond to a function named paste0(‘prepSpeciesLayers_’, types). Defaults to ‘KNN’ to get the Canadian Forestry Service, National Forest Inventory, kNN-derived species cover maps from year ‘dataYear’, using the LandR::prepSpeciesLayers_KNN function (see https://open.canada.ca/ data/en/dataset/ec9e2659-1c29-4ddb-87a2-6aced147a990 for details on these data). Other currently available options are ‘ONFRI’, ‘CASFRI’, ‘Pickell’ and ‘ForestInventory’, which attempt to get proprietary data - the user must be granted access first. A custom function can be used to retrieve any data, just as long as it is accessible by the module (e.g., in the global environment) and is named as paste0(‘prepSpeciesLayers_’, types).
vegLeadingProportion numeric 0.8 0 1 a number that defines whether a species is leading for a given pixel. Only used for plotting.
.plotInitialTime numeric NA NA NA This describes the simulation time at which the first plot event should occur
.plotInterval numeric NA NA NA This describes the simulation time interval between plot events
.plots character screen NA NA Passed to types in Plots (see ?Plots). There are a few plots that are made within this module, if set. Note that plots (or their data) saving will ONLY occur at end(sim). If NA, plotting is turned off completely (this includes plot saving).
.saveInitialTime numeric NA NA NA This describes the simulation time at which the first save event should occur
.saveInterval numeric NA NA NA This describes the simulation time interval between save events
.sslVerify integer 64 NA NA Passed to httr::config(ssl_verifypeer = P(sim)$sslVerify) when downloading KNN (NFI) datasets. Set to 0L if necessary to bypass checking the SSL certificate (this may be necessary when NFI’s website SSL certificate is not correctly configured).
.studyAreaName character NA NA NA Human-readable name for the study area used. If NA, a hash of studyAreaLarge will be used.
.useCache character init NA NA Controls cache; caches the init event by default
.useParallel numeric 2 NA NA Used in reading csv file with fread. Will be passed to data.table::setDTthreads.

Of the parameters listed in Table 4.7, the following are particularly important:

  • coverThresh – integer. Defines a minimum % cover value (from 0-100) that the species must have in at least one pixel to be considered present in the study area, otherwise it is excluded from the final stack of species layers. Note that this will affect what species have data for an eventual simulation and the user will need to adjust simulation parameters (e.g., species in trait tables will need to match the species in the cover layers) accordingly.

  • types – character. Which % cover data sources are to be used (see Detailed description). Several data sources can be passed, in which case the module will overlay the lower quality layers with higher quality ones following the order of data sources specified by types – i.e., if types == c("KNN", "CASFRI", "ForestInventory"), KNN is assumed to be the lowest quality data set and ForestInventory the highest: values in KNN layers are replaced with overlapping values from CASFRI layers and values from KNN and CASFRI layers are replaced with overlapping values of ForestInventory layers.

4.2.3 Simulation flow

The general flow of Biomass_speciesData processes is:

  1. Download (if necessary) of and spatial processing of species cover layers from the first data source listed in the types parameter. Spatial processing consists in sub-setting the data to the area defined by studyAreaLarge and ensuring that the spatial projection and resolution match those of rasterToMatchLarge. After spatial processing, species layers that have no pixels with values >= to the coverThresh parameter are excluded.

  2. If more than one data source is listed in types, the second set of species cover layers is downloaded and processed as above.

  3. The second set of layers is assumed to be the highest quality dataset and used to replaced overlapping pixel values on the first (including for species whose layers may have been initially excluded after applying the coverThresh filter).

  4. Steps 2 and 3 are repeated for remaining data sources listed in types.

  5. Final layers are saved to disk and plotted. A summary of number of pixels with forest cover are calculated (treedand numTreed output objects; see Module outputs).

4.3 Usage example

4.3.1 Load SpaDES and other packages.

if (!require(Require)) {
    install.packages("Require")
    library(Require)
}

Require(c("PredictiveEcology/SpaDES.install", "SpaDES", "PredictiveEcology/SpaDES.core@development",
    "PredictiveEcology/LandR"), install_githubArgs = list(dependencies = TRUE))

4.3.2 Get module, necessary packages and set up folder directories

tempDir <- tempdir()
paths <- list(inputPath = normPath(file.path(tempDir, "inputs")),
    cachePath = normPath(file.path(tempDir, "cache")), modulePath = normPath(file.path(tempDir,
        "modules")), outputPath = normPath(file.path(tempDir,
        "outputs")))

getModule("PredictiveEcology/Biomass_speciesData", modulePath = paths$modulePath,
    overwrite = TRUE)

## make sure all necessary packages are installed:
makeSureAllPackagesInstalled(paths$modulePath)

4.3.3 Setup simulation

For this demonstration we are using all default parameter values, except coverThresh , which is lowered to 5%. The species layers (the major output of interest) are saved automatically, so there is no need to tell spades what to save using the outputs argument (see ?SpaDES.core::outputs).

We pass the global parameter .plotInitialTime = 1 in the simInitAndSpades function to activate plotting.

# User may want to set some options -- see
# ?reproducibleOptions -- e.g., often the path to the
# 'inputs' folder will be set outside of project by user:
# options(reproducible.inputPaths =
# 'E:/Data/LandR_related/') # to re-use datasets across
# projects
studyAreaLarge <- Cache(randomStudyArea, size = 1e+07, cacheRepo = paths$cachePath)  # cache this so it creates a random one only once on a machine

# Pick the species you want to work with -- here we use the
# naming convention in 'Boreal' column of
# LandR::sppEquivalencies_CA (default)
speciesNameConvention <- "Boreal"
speciesToUse <- c("Pice_Gla", "Popu_Tre", "Pinu_Con")

sppEquiv <- LandR::sppEquivalencies_CA[get(speciesNameConvention) %in%
    speciesToUse]
# Assign a colour convention for graphics for each species
sppColorVect <- LandR::sppColors(sppEquiv, speciesNameConvention,
    newVals = "Mixed", palette = "Set1")

## Usage example
modules <- list("Biomass_speciesData")
objects <- list(studyAreaLarge = studyAreaLarge, sppEquiv = sppEquiv,
    sppColorVect = sppColorVect)
params <- list(Biomass_speciesData = list(coverThresh = 5L))

4.3.4 Run module

Note that because this is a data module (i.e., only attempts to prepare data for the simulation) we are not iterating it and so both the start and end times are set to 1 here.

opts <- options(reproducible.useCache = TRUE, reproducible.inputPaths = paths$inputPath)

mySimOut <- simInitAndSpades(times = list(start = 1, end = 1),
    modules = modules, parameters = params, objects = objects,
    paths = paths, .plotInitialTime = 1)
options(opts)

Here are some of outputs of Biomass_speciesData (dominant species) in a randomly generated study area within Canada.

Biomass_speciesData automatically generates a plot of species dominance and number of presences in the study area when `.plotInitialTime=1` is passed as an argument.

Figure 4.1: Biomass_speciesData automatically generates a plot of species dominance and number of presences in the study area when .plotInitialTime=1 is passed as an argument.

4.4 References