sapfluxnetr
R package provides tools for a tidy data
analysis for the first sap flow measurements global database (SAPFLUXNET Project). In this
vignette you will learn how to install the package, download the data
and get started with some data metrics.
sapfluxnetr
is in the CRAN, so the installation is
straightforward:
install.packages('sapfluxnetr')
Development versions of the package reside in github. If you want the
latest updates, and also the latest bugs ( be advised
;) ), please install the package from the devel branch of the
github
repository with the remotes
package:
# if (!require(remotes)) {install.packages('remotes')}
::install_github(
remotes'sapfluxnet/sapfluxnetr', ref = 'devel',
build_opts = c("--no-resave-data", "--no-manual", "--build-vignettes")
)
Now you can load the package, along with the
tidyverse
-related packages, as they will be needed
later:
library(sapfluxnetr)
# tidyverse
library(dplyr)
library(ggplot2)
.zip file is about 3GB. Unzipped folders use about 24GB of disk space. Please check thet you have enough space available before downloading and unzipping the files.
Data is publicly available on Zenodo to download. You can manually download the data or use a programmatic approach:
# download the data
download.file(
url = "https://zenodo.org/record/3971689/files/0.1.5.zip?download=1",
destfile = '0.1.5.zip'
)# unzip the data
# BE SURE YOU HAVE AT LEAST 24GB OF DISK SPACE
unzip("0.1.5.zip")
# check if files are present
list.files(file.path('0.1.5', 'RData', 'plant'))
list.files(file.path('0.1.5', 'csv', 'plant'))
SAPFLUXNET database structure is as follows:
0.1.5 (database version number)
|
|-- RData
| |
| |-- plant
| |-- sapwood
| |-- leaf
|
|-- csv
| |
| |-- plant
| |-- sapwood
| |-- leaf
RData
folder contains the RData files for each site
divided by sap flow units level:
plant
sites with sap flow plant level units
available.sapwood
sites with sap flow sapwood level units
available.leaf
sites with sap flow leaf level units
available.csv
folder contains the csv files (9 files, 5 of
metadata, 2 of data and 2 more for the data flags) for each units level
available and site. We do not provide scripts or functions to work with
the csv files, only the RData objects.
To start working with the data, you have two options:
If working in RStudio, create a project in the data root folder
(0.1.5
in this example) and follow the examples in this
vignette to get started.
If not, set the data root folder (0.1.5
in this
example) as the working directory with setwd
and follow the
examples in this vignette to get started.
DISCLAIMER: In order to be able to build the vignette in the CRAN tests the following examples will be based on a small subset of SAPFLUXNET Data, composed by
ARG_TRE
,ARG_MAZ
andAUS_CAN_ST2_MIX
sites. Outputs will vary if you follow the vignette examples with the complete database.
First, let’s get used to the data structure thet SAPFLUXNET provides, and for thet we will choose a site and start playing with it.
In this example we will use the ARG_MAZ
site, as it is
small and it will be fast seeing the package capabilities. There are
sites like FRA_PUE
100 times bigger then this, but as one
can imagine, the time is also increasing when analyzing those bigger
datasets.
So, let’s read the data in the environment:
# read the data
<- read_sfn_data('ARG_MAZ', folder = 'RData/plant')
arg_maz
# see a brief summary of the site:
arg_maz
#> sfn_data object
#> Data from ARG_MAZ site
#>
#> Data kindly provided by Sebastian Pfautsch from University of Sydney
#> and Pablo Peri from Instituto Nacional de Tecnología Agropecuaria
#>
#> Site related literature: DOI 10.1007/s00468-013-0935-4
#>
#> Sapflow data: 288 observations of 5 trees/plants
#> Species present: Nothofagus pumilio
#>
#> Environmental data: 288 observations.
#> Variables present:
#> ta rh vpd sw_in ws precip swc_shallow ppfd_in ext_rad
#>
#> Biome: Woodland/Shrubland
#>
#> TIMESTAMP span: 2009-11-19 -03--2009-11-30 23:00:00 -03
#>
#> Solar TIMESTAMP span: 2009-11-18 22:24:18 UTC--2009-11-30 21:20:24 UTC
#>
#> Sapflow data flags:
#> No flags present
#>
#> Environmental data flags:
#> RANGE_WARN OUT_WARN CALCULATED
#> 9 17 576
At first glance, we know by this summary thet is an Argentinian
forest (first three letters of the site code are always the country
code), contributed by Sebastian Pfautsch and Pablo Peri with 5
Nothofagus pumilio trees measured along 15 days in 2009. Also
we can see the environmental variables measured (ta
,
rh
, vpd
, sw_in
, ws
,
precip
, swc_shallow
, ppfd_in
and
ext_rad
) and the biome classification. Finally, we can see
thet the environmental data has some flags (more on thet later).
sfn_data
objects have different slots containing the
different data, each of one has a getter function (see
?sfn_get_methods
and
vignette('sfn-data-classes', package = 'sapfluxnetr'
for
detailed info):
# sapf data with original site timestamp
<- get_sapf_data(arg_maz, solar = FALSE)
arg_maz_sapf
arg_maz_sapf#> # A tibble: 288 × 6
#> TIMESTAMP ARG_MAZ_Npu_Jt_1 ARG_MAZ_Npu_Jt_2 ARG_M…¹ ARG_M…² ARG_M…³
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009-11-19 00:00:00 1955. 754. 742. 2679. 1494.
#> 2 2009-11-19 01:00:00 1833. 752. 631. 2527. 1504.
#> 3 2009-11-19 02:00:00 1817. 795. 597. 2514. 1446.
#> 4 2009-11-19 03:00:00 1598. 788. 579. 1938. 1265.
#> 5 2009-11-19 04:00:00 1419. 780. 518. 2061. 1181.
#> 6 2009-11-19 05:00:00 1506. 735. 430. 1977. 1231.
#> 7 2009-11-19 06:00:00 1649. 720. 539. 1984. 1310.
#> 8 2009-11-19 07:00:00 1630. 882. 542. 2371. 1506.
#> 9 2009-11-19 08:00:00 1909. 1015. 655. 2926. 1690.
#> 10 2009-11-19 09:00:00 2849. 1459. 930. 3802. 2265.
#> # … with 278 more rows, and abbreviated variable names ¹ARG_MAZ_Npu_Jt_3,
#> # ²ARG_MAZ_Npu_Jt_4, ³ARG_MAZ_Npu_Jt_5
# env_data with calculated aparent solar time
<- get_env_data(arg_maz, solar = TRUE)
arg_maz_env
arg_maz_env#> # A tibble: 288 × 10
#> TIMESTAMP ta rh vpd sw_in ws precip swc_shallow ppfd_in
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009-11-18 22:24:18 4.99 55.6 0.387 0 12.9 0 0.359 0
#> 2 2009-11-18 23:24:18 4.57 60 0.339 0 9.7 0 0.359 0
#> 3 2009-11-19 00:24:02 4.99 60.4 0.345 0 11.3 0 0.358 0
#> 4 2009-11-19 01:24:02 3.31 66 0.263 0 9.7 0.2 0.357 0
#> 5 2009-11-19 02:24:02 2.03 68.7 0.221 0 12.9 0 0.357 0
#> 6 2009-11-19 03:24:02 2.46 58.8 0.300 0 17.7 0 0.356 0
#> 7 2009-11-19 04:24:02 2.46 55.2 0.327 0 17.7 0 0.356 0
#> 8 2009-11-19 05:24:02 2.89 53.7 0.348 3 14.5 0 0.356 6.34
#> 9 2009-11-19 06:24:02 4.15 52.8 0.388 78 9.7 0 0.356 165.
#> 10 2009-11-19 07:24:02 5.4 43.2 0.509 275 11.3 0 0.355 581.
#> # … with 278 more rows, and 1 more variable: ext_rad <dbl>
You can see thet the TIMESTAMP variable changes between both kinds of
data. That is because the TIMESTAMP returned is controlled by the
solar
parameter (see ?sfn_get_methods
).
Metadata can be accessed in the same way:
<- get_site_md(arg_maz)
arg_maz_site_md
arg_maz_site_md#> # A tibble: 1 × 24
#> si_addcontr_…¹ si_ad…² si_ad…³ si_ad…⁴ si_code si_co…⁵ si_co…⁶ si_co…⁷ si_co…⁸
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> Pablo Instit… Peri ARG_MAZ <NA> Sebast… Univer… Pfauts…
#> # … with 15 more variables: si_country <chr>, si_dendro_network <lgl>,
#> # si_dist_mgmt <chr>, si_elev <int>, si_flux_network <lgl>, si_igbp <chr>,
#> # si_lat <dbl>, si_long <dbl>, si_name <chr>, si_paper <chr>,
#> # si_remarks <lgl>, is_inside_country <lgl>, si_mat <dbl>, si_map <dbl>,
#> # si_biome <chr>, and abbreviated variable names ¹si_addcontr_email,
#> # ²si_addcontr_firstname, ³si_addcontr_institution, ⁴si_addcontr_lastname,
#> # ⁵si_contact_email, ⁶si_contact_firstname, ⁷si_contact_institution, …
<- get_stand_md(arg_maz)
arg_maz_stand_md
arg_maz_stand_md#> # A tibble: 1 × 18
#> st_age st_asp…¹ st_ba…² st_cl…³ st_de…⁴ st_gr…⁵ st_he…⁶ st_lai st_name st_re…⁷
#> <int> <chr> <dbl> <lgl> <int> <chr> <int> <lgl> <lgl> <lgl>
#> 1 180 E 59.1 NA 518 Natura… 20 NA NA NA
#> # … with 8 more variables: st_sand_perc <lgl>, st_silt_perc <lgl>,
#> # st_soil_depth <lgl>, st_soil_texture <chr>, st_terrain <chr>,
#> # st_treatment <lgl>, si_code <chr>, st_USDA_soil_texture <chr>, and
#> # abbreviated variable names ¹st_aspect, ²st_basal_area, ³st_clay_perc,
#> # ⁴st_density, ⁵st_growth_condition, ⁶st_height, ⁷st_remarks
<- get_species_md(arg_maz)
arg_maz_species_md
arg_maz_species_md#> # A tibble: 1 × 5
#> sp_basal_area_perc sp_leaf_habit sp_name sp_ntrees si_code
#> <int> <chr> <chr> <int> <chr>
#> 1 100 cold deciduous Nothofagus pumilio 5 ARG_MAZ
<- get_plant_md(arg_maz)
arg_maz_plant_md
arg_maz_plant_md#> # A tibble: 5 × 26
#> pl_age pl_azi…¹ pl_ba…² pl_code pl_dbh pl_he…³ pl_le…⁴ pl_name pl_ra…⁵ pl_re…⁶
#> <int> <chr> <dbl> <chr> <dbl> <lgl> <dbl> <chr> <chr> <lgl>
#> 1 180 Correct… 14.7 ARG_MA… 41.1 NA 109. Np-1 Correc… NA
#> 2 180 No azim… 13 ARG_MA… 33.2 NA 58.4 Np-2 Correc… NA
#> 3 180 No azim… 6 ARG_MA… 23.2 NA 35.0 Np-3 Correc… NA
#> 4 180 No azim… 21 ARG_MA… 55.6 NA 174. Np-4 Correc… NA
#> 5 180 No azim… 10 ARG_MA… 38 NA 88 Np-5 Correc… NA
#> # … with 16 more variables: pl_sap_units <chr>, pl_sapw_area <dbl>,
#> # pl_sapw_depth <dbl>, pl_sens_calib <lgl>, pl_sens_cor_grad <chr>,
#> # pl_sens_cor_zero <chr>, pl_sens_hgt <dbl>, pl_sens_length <int>,
#> # pl_sens_man <chr>, pl_sens_meth <chr>, pl_sens_timestep <int>,
#> # pl_social <chr>, pl_species <chr>, pl_treatment <lgl>, si_code <chr>,
#> # pl_sap_units_orig <chr>, and abbreviated variable names ¹pl_azimut_int,
#> # ²pl_bark_thick, ³pl_height, ⁴pl_leaf_area, ⁵pl_radial_int, ⁶pl_remarks
<- get_env_md(arg_maz)
arg_maz_env_md
arg_maz_env_md#> # A tibble: 1 × 17
#> env_l…¹ env_n…² env_p…³ env_p…⁴ env_p…⁵ env_r…⁶ env_rh env_s…⁷ env_s…⁸ env_s…⁹
#> <lgl> <chr> <chr> <chr> <chr> <lgl> <chr> <lgl> <int> <chr>
#> 1 NA Not pr… leaf: … Not pr… Cleari… NA Clear… NA 15 Cleari…
#> # … with 7 more variables: env_ta <chr>, env_time_daylight <lgl>,
#> # env_timestep <int>, env_time_zone <chr>, env_vpd <chr>, env_ws <chr>,
#> # si_code <chr>, and abbreviated variable names ¹env_leafarea_seasonal,
#> # ²env_netrad, ³env_plant_watpot, ⁴env_ppfd_in, ⁵env_precip, ⁶env_remarks,
#> # ⁷env_swc_deep_depth, ⁸env_swc_shallow_depth, ⁹env_sw_in
If in doubt about some of the metadata variables (what it means,
units…) a description can be obtained from
describe_md_variable
function:
# what is env_ta?
describe_md_variable('env_ta')
#> Description:
#> Location of air temperature sensor
#>
#> Values:
#> Above canopy | Within canopy | Clearing | Off-site | Not provided |
#>
#> Units:
#> Fixed values
#>
#> Type:
#> Character
# or pl_species?
describe_md_variable('pl_species')
#> Description:
#> Species identity of the measured plant
#>
#> Values:
#> Contributor defined |
#>
#> Units:
#> Scientific name without author abbreviation, as accepted by The Plant List
#>
#> Type:
#> Character
There is also some getters thet can come in handy sometimes.
get_timestamp
and get_solar_timestamp
access
to the original timestamp and the apparent solar time timestamp.
get_si_code
access to the site code. See
vignette('sfn-data-classes', package = 'sapfluxnetr'
for
more info.
sfn_data
objects also have two more slots, accessed with
get_sapf_flags
and get_env_flags
.
<- get_sapf_flags(arg_maz, solar = TRUE)
arg_maz_sapf_flags
arg_maz_sapf_flags#> # A tibble: 288 × 6
#> TIMESTAMP ARG_MAZ_Npu_Jt_1 ARG_MAZ_Npu_Jt_2 ARG_M…¹ ARG_M…² ARG_M…³
#> <dttm> <chr> <chr> <chr> <chr> <chr>
#> 1 2009-11-18 22:24:18 "" "" "" "" ""
#> 2 2009-11-18 23:24:18 "" "" "" "" ""
#> 3 2009-11-19 00:24:02 "" "" "" "" ""
#> 4 2009-11-19 01:24:02 "" "" "" "" ""
#> 5 2009-11-19 02:24:02 "" "" "" "" ""
#> 6 2009-11-19 03:24:02 "" "" "" "" ""
#> 7 2009-11-19 04:24:02 "" "" "" "" ""
#> 8 2009-11-19 05:24:02 "" "" "" "" ""
#> 9 2009-11-19 06:24:02 "" "" "" "" ""
#> 10 2009-11-19 07:24:02 "" "" "" "" ""
#> # … with 278 more rows, and abbreviated variable names ¹ARG_MAZ_Npu_Jt_3,
#> # ²ARG_MAZ_Npu_Jt_4, ³ARG_MAZ_Npu_Jt_5
<- get_env_flags(arg_maz, solar = TRUE)
arg_maz_env_flags
arg_maz_env_flags#> # A tibble: 288 × 10
#> TIMESTAMP ta rh vpd sw_in ws precip swc_sh…¹ ppfd_in
#> <dttm> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2009-11-18 22:24:18 "" "" "" "" "" "" "" CALCUL…
#> 2 2009-11-18 23:24:18 "" "" "" "" "" "" "" CALCUL…
#> 3 2009-11-19 00:24:02 "" "" "" "" "" "" "" CALCUL…
#> 4 2009-11-19 01:24:02 "" "" "" "" "" "OUT_WARN" "" CALCUL…
#> 5 2009-11-19 02:24:02 "" "" "" "" "" "" "" CALCUL…
#> 6 2009-11-19 03:24:02 "" "" "" "" "" "" "" CALCUL…
#> 7 2009-11-19 04:24:02 "" "" "" "" "" "" "" CALCUL…
#> 8 2009-11-19 05:24:02 "" "" "" "" "" "" "" CALCUL…
#> 9 2009-11-19 06:24:02 "" "" "" "" "" "" "" CALCUL…
#> 10 2009-11-19 07:24:02 "" "" "" "" "" "" "" CALCUL…
#> # … with 278 more rows, 1 more variable: ext_rad <chr>, and abbreviated
#> # variable name ¹swc_shallow
This datasets store any flag thet each data point may have (possible
outlier, data removed in the Quality Check of the data…). For a complete
list of flags possible values see
vignette('data-flags', package = 'sapfluxnetr')
. As an
example, let’s see which values are marked as “RANGE_WARN” (a warning
indicating thet the value may be out of normal variable range):
%>%
arg_maz_env_flags filter_all(any_vars(stringr::str_detect(., 'RANGE_WARN')))
#> # A tibble: 9 × 10
#> TIMESTAMP ta rh vpd sw_in ws precip swc_sh…¹ ppfd_in
#> <dttm> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2009-11-25 13:22:41 "" "" "" "" RANGE_WARN "" "" CALCUL…
#> 2 2009-11-26 08:22:30 "" "" "" "" RANGE_WARN "" "" CALCUL…
#> 3 2009-11-26 09:22:30 "" "" "" "" RANGE_WARN "" "" CALCUL…
#> 4 2009-11-26 10:22:31 "" "" "" "" RANGE_WARN "" "" CALCUL…
#> 5 2009-11-26 11:22:31 "" "" "" "" RANGE_WARN "" "" CALCUL…
#> 6 2009-11-26 12:22:32 "" "" "" "" RANGE_WARN "" "" CALCUL…
#> 7 2009-11-26 13:22:32 "" "" "" "" RANGE_WARN "" "" CALCUL…
#> 8 2009-11-28 10:21:09 "" "" "" "" RANGE_WARN "" "" CALCUL…
#> 9 2009-11-28 12:21:09 "" "" "" "" RANGE_WARN "" "" CALCUL…
#> # … with 1 more variable: ext_rad <chr>, and abbreviated variable name
#> # ¹swc_shallow
We see thet the out of range warnings refer to wind variable. We can cross the data to see which values of wind speed are giving the warnings,
%>%
arg_maz_env_flags filter_all(any_vars(stringr::str_detect(., 'RANGE_WARN'))) %>%
semi_join(arg_maz_env, ., by = 'TIMESTAMP') %>%
select(TIMESTAMP, ws)
#> # A tibble: 9 × 2
#> TIMESTAMP ws
#> <dttm> <dbl>
#> 1 2009-11-25 13:22:41 45.1
#> 2 2009-11-26 08:22:30 48.3
#> 3 2009-11-26 09:22:30 51.5
#> 4 2009-11-26 10:22:31 48.3
#> 5 2009-11-26 11:22:31 62.8
#> 6 2009-11-26 12:22:32 57.9
#> 7 2009-11-26 13:22:32 48.3
#> 8 2009-11-28 10:21:09 46.7
#> 9 2009-11-28 12:21:09 48.3
and confirm thet the warnings refer to values above the “usual” wind speed maximum.
We can also plot the different data with the help of
sfn_plot
function. It will return ggplot
objects thet can be modified afterwards:
sfn_plot(arg_maz, type = 'sapf', solar = TRUE) +
facet_wrap(~ Tree) +
theme(legend.position = 'none')
sfn_plot(arg_maz, type = 'env', solar = TRUE) +
facet_wrap(~ Variable, scales = 'free_y') +
theme(legend.position = 'none')
We can also plot environmental variables individually (with the
type
argument), or an environmental variable
versus the sap flow measurements (with the
formula_env
argument). See ?sfn_plot
for a
complete description of the available plots.
# vpd individually
sfn_plot(arg_maz, type = 'vpd', solar = TRUE)
# vpd vs sapf
sfn_plot(arg_maz, formula_env = ~vpd, solar = TRUE) +
theme(legend.position = 'none')
SAPFLUXET data is stored as sub-daily measures with different time
step between sites (ranging from 10 minutes to 2 hours).
sapfluxnetr
offers some simple, yet powerful aggregation
functions returning pre-defined statistics: daily_metrics
,
monthly_metrics
, predawn_metrics
,
midday_metrics
, daylight_metrics
and
nightly_metrics
.
daily_metrics
and monthly_metrics
perform a
daily and monthly aggregation, respectively.
predawn_metrics
, midday_metrics
,
daylight_metrics
and nightly_metrics
perform
daily or monthly aggregations (controlled by the period
argument) only by hour-defined intervals. All the aggregations are
performed both for sap flow and environmental data.
Predefined calculated statistics are:
daily_metrics
, see
?diurnal_centroid
for limitations in the calculation of
this metric)Let’s see some examples:
<- daily_metrics(arg_maz, solar = TRUE)
arg_maz_daily #> [1] "Crunching data for ARG_MAZ. In large datasets this could take a while"
#> [1] "General data for ARG_MAZ"
names(arg_maz_daily)
#> [1] "sapf" "env"
names(arg_maz_daily[['sapf']])
#> [1] "TIMESTAMP" "ARG_MAZ_Npu_Jt_1_mean"
#> [3] "ARG_MAZ_Npu_Jt_2_mean" "ARG_MAZ_Npu_Jt_3_mean"
#> [5] "ARG_MAZ_Npu_Jt_4_mean" "ARG_MAZ_Npu_Jt_5_mean"
#> [7] "ARG_MAZ_Npu_Jt_1_sd" "ARG_MAZ_Npu_Jt_2_sd"
#> [9] "ARG_MAZ_Npu_Jt_3_sd" "ARG_MAZ_Npu_Jt_4_sd"
#> [11] "ARG_MAZ_Npu_Jt_5_sd" "ARG_MAZ_Npu_Jt_1_coverage"
#> [13] "ARG_MAZ_Npu_Jt_2_coverage" "ARG_MAZ_Npu_Jt_3_coverage"
#> [15] "ARG_MAZ_Npu_Jt_4_coverage" "ARG_MAZ_Npu_Jt_5_coverage"
#> [17] "ARG_MAZ_Npu_Jt_1_q_95" "ARG_MAZ_Npu_Jt_2_q_95"
#> [19] "ARG_MAZ_Npu_Jt_3_q_95" "ARG_MAZ_Npu_Jt_4_q_95"
#> [21] "ARG_MAZ_Npu_Jt_5_q_95" "ARG_MAZ_Npu_Jt_1_centroid"
#> [23] "ARG_MAZ_Npu_Jt_2_centroid" "ARG_MAZ_Npu_Jt_3_centroid"
#> [25] "ARG_MAZ_Npu_Jt_4_centroid" "ARG_MAZ_Npu_Jt_5_centroid"
names(arg_maz_daily[['env']])
#> [1] "TIMESTAMP" "ta_mean" "rh_mean"
#> [4] "vpd_mean" "sw_in_mean" "ws_mean"
#> [7] "precip_mean" "swc_shallow_mean" "ppfd_in_mean"
#> [10] "ext_rad_mean" "ta_sd" "rh_sd"
#> [13] "vpd_sd" "sw_in_sd" "ws_sd"
#> [16] "precip_sd" "swc_shallow_sd" "ppfd_in_sd"
#> [19] "ext_rad_sd" "ta_coverage" "rh_coverage"
#> [22] "vpd_coverage" "sw_in_coverage" "ws_coverage"
#> [25] "precip_coverage" "swc_shallow_coverage" "ppfd_in_coverage"
#> [28] "ext_rad_coverage" "ta_q_95" "rh_q_95"
#> [31] "vpd_q_95" "sw_in_q_95" "ws_q_95"
#> [34] "precip_q_95" "swc_shallow_q_95" "ppfd_in_q_95"
#> [37] "ext_rad_q_95" "precip_accumulated"
We can see thet results are divided in sapf
and
env
and inside each of them the metrics are indicated by
the end of the variable names.
This way we can select specific variables, for example the 0.95 quantile
of sap flow measures:
'sapf']] %>%
arg_maz_daily[[select(TIMESTAMP, ends_with('q_95'))
#> # A tibble: 13 × 6
#> TIMESTAMP ARG_MAZ_Npu_Jt_1_q_95 ARG_MAZ_N…¹ ARG_M…² ARG_M…³ ARG_M…⁴
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009-11-18 00:00:00 1949. 754. 736. 2671. 1503.
#> 2 2009-11-19 00:00:00 4603. 2835. 1692. 7276. 3908.
#> 3 2009-11-20 00:00:00 2561. 1289. 801. 3833. 1924.
#> 4 2009-11-21 00:00:00 4001. 2459. 1508. 6740. 3920.
#> 5 2009-11-22 00:00:00 4396. 2761. 1466. 6704. 3852.
#> 6 2009-11-23 00:00:00 5192. 3160. 1842. 8019. 5251.
#> 7 2009-11-24 00:00:00 5135. 3160. 1879. 7696. 5098.
#> 8 2009-11-25 00:00:00 4724. 3077. 1522. 7382. 4253.
#> 9 2009-11-26 00:00:00 5642. 3449. 1845. 8391. 5247.
#> 10 2009-11-27 00:00:00 6560. 4174. 2251. 11112. 6647.
#> 11 2009-11-28 00:00:00 7142. 4288. 2362. 9670. 6471.
#> 12 2009-11-29 00:00:00 5319. 3494. 2000. 9690. 5501.
#> 13 2009-11-30 00:00:00 2408. 1804. 711. 3828. 2313.
#> # … with abbreviated variable names ¹ARG_MAZ_Npu_Jt_2_q_95,
#> # ²ARG_MAZ_Npu_Jt_3_q_95, ³ARG_MAZ_Npu_Jt_4_q_95, ⁴ARG_MAZ_Npu_Jt_5_q_95
The same is applicable to the environmental data, in this case the mean values:
'env']] %>%
arg_maz_daily[[select(TIMESTAMP, ends_with('mean'))
#> # A tibble: 13 × 10
#> TIMESTAMP ta_mean rh_mean vpd_mean sw_in_…¹ ws_mean preci…² swc_s…³
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009-11-18 00:00:00 4.78 57.8 0.363 0 11.3 0 0.359
#> 2 2009-11-19 00:00:00 5.56 45.7 0.521 300. 12.0 0.00833 0.351
#> 3 2009-11-20 00:00:00 1.39 73.5 0.181 134. 12.6 0.1 0.343
#> 4 2009-11-21 00:00:00 2.44 69.6 0.236 207. 14.2 0.00833 0.341
#> 5 2009-11-22 00:00:00 2.21 71.9 0.216 282. 15.8 0.00833 0.335
#> 6 2009-11-23 00:00:00 3.19 56.7 0.364 335. 11.4 0 0.328
#> 7 2009-11-24 00:00:00 6.16 52.1 0.511 312. 13.7 0 0.319
#> 8 2009-11-25 00:00:00 6.11 58.5 0.409 250. 21.7 0 0.309
#> 9 2009-11-26 00:00:00 4.99 72.4 0.269 218. 34.2 0.258 0.327
#> 10 2009-11-27 00:00:00 4.98 61.2 0.363 336. 25.8 0 0.371
#> 11 2009-11-28 00:00:00 5.87 61.6 0.411 212. 25.2 0.0167 0.348
#> 12 2009-11-29 00:00:00 1.45 63.6 0.266 246. 17.7 0.45 0.338
#> 13 2009-11-30 00:00:00 0.299 78.0 0.140 241. 18.0 0.00909 0.345
#> # … with 2 more variables: ppfd_in_mean <dbl>, ext_rad_mean <dbl>, and
#> # abbreviated variable names ¹sw_in_mean, ²precip_mean, ³swc_shallow_mean
If interested in custom metrics or custom aggregations, there is a
generic function, sfn_metrics
thet allows for customization
of the statistics to calculate and the periods to aggregate. See
?sfn_metrics
and
vignette('custom-aggregation'. package = 'sapfluxnetr')
for
more details about it.
It’s worth to mention thet aggregated TIMESTAMPS are fixed to the
beginning of the period aggregated, meaning thet data
from 2018-01-01 00:00:00
to
2018-01-01 23:59:59
are aggregated as
2018-01-01 00:00:00
.
You can change this using the side
parameter (see
?sfn_metrics
)
The default returned object for the aggregation functions is a list
with the sap flow and the environmental data, but given thet usually is
more comfortable to have all data (sap flow and environmental) and
ancillary data (metadata) altogether in a tidy data frame (each row an
observation), all aggregation functions have an argument,
tidy
thet can be set to TRUE to obtain this kind
of data frame. We will cover this in the “Tidy
metrics” section.
Getting the insights about one site is interesting, but getting the
insights of a common group of sites could be even more interesting.
sapfluxnetr
allows filtering sites by metadata values
(biome, country, species…) and work with them as a unique set.
First thing we have to do is creating a metadata database. It is not mandatory, but filtering sites by metadata can be a time/resources consuming step if we have to temporary build the database each time we want filter sites. So, let’s create a cached metadata database. This will take some minutes, so maybe it is a good moment to prepare a hot beverage ;)
<- read_sfn_metadata(folder = 'RData/plant', .write_cache = TRUE) sfn_metadata
#> [1] "processing site ARG_MAZ (1 of 3)"
#> [1] "processing site ARG_TRE (2 of 3)"
#> [1] "processing site AUS_CAN_ST2_MIX (3 of 3)"
The important bit here is .write_cache = TRUE
. This will
write a file called .metadata_cache.RData
containing all
the metadata for all sites present in folder
. This file
will be used any time we will filter the metadata, so there is no need
of accessing all the data again.
If we take a look at sfn_metadata
we can see a list with 5
data frames, one for each metadata class (site, stand, species, plant
and environmental metadata).
# access plant metadata
'plant_md']]
sfn_metadata[[#> # A tibble: 43 × 26
#> pl_age pl_az…¹ pl_ba…² pl_code pl_dbh pl_he…³ pl_le…⁴ pl_name pl_ra…⁵ pl_re…⁶
#> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 180 Correc… 14.7 ARG_MA… 41.1 NA 109. Np-1 Correc… <NA>
#> 2 180 No azi… 13 ARG_MA… 33.2 NA 58.4 Np-2 Correc… <NA>
#> 3 180 No azi… 6 ARG_MA… 23.2 NA 35.0 Np-3 Correc… <NA>
#> 4 180 No azi… 21 ARG_MA… 55.6 NA 174. Np-4 Correc… <NA>
#> 5 180 No azi… 10 ARG_MA… 38 NA 88 Np-5 Correc… <NA>
#> 6 140 Correc… 23.5 ARG_TR… 23.6 NA 29.7 Na-1 Correc… <NA>
#> 7 140 No azi… 22.5 ARG_TR… 20.2 NA 13.0 Na-2 Correc… <NA>
#> 8 140 No azi… 22 ARG_TR… 23.8 NA 29.8 Na-3 Correc… <NA>
#> 9 140 No azi… 23.5 ARG_TR… 19.5 NA 15.1 Na-4 Correc… <NA>
#> 10 14.5 No azi… NA AUS_CA… 13.1 14.3 NA CRR4AE5 Measur… The tr…
#> # … with 33 more rows, 16 more variables: pl_sap_units <chr>,
#> # pl_sapw_area <dbl>, pl_sapw_depth <dbl>, pl_sens_calib <lgl>,
#> # pl_sens_cor_grad <chr>, pl_sens_cor_zero <chr>, pl_sens_hgt <dbl>,
#> # pl_sens_length <int>, pl_sens_man <chr>, pl_sens_meth <chr>,
#> # pl_sens_timestep <int>, pl_social <chr>, pl_species <chr>,
#> # pl_treatment <chr>, si_code <chr>, pl_sap_units_orig <chr>, and abbreviated
#> # variable names ¹pl_azimut_int, ²pl_bark_thick, ³pl_height, ⁴pl_leaf_area, …
Now thet we have our metadata database built, we can inspect the site
codes in a folder with sfn_sites_in_folder
:
<- 'RData/plant/'
folder <- sfn_sites_in_folder(folder)
sites sites
#> [1] "ARG_MAZ" "ARG_TRE" "AUS_CAN_ST2_MIX"
We can filter these sites by any metadata variable, to select those
thet met some criteria. This is done with
filter_sites_by_md
. As a first try, let’s list all sites
belonging to temperate forests (woodland/shrubland included):
<- sfn_sites_in_folder(folder) %>%
temperate filter_sites_by_md(
%in% c('Woodland/Shrubland', 'Temperate forest'),
si_biome metadata = sfn_metadata
)
temperate
#> [1] "ARG_MAZ" "ARG_TRE" "AUS_CAN_ST2_MIX"
You can combine all filters you want:
<- sfn_sites_in_folder(folder) %>%
temperate_hr filter_sites_by_md(
%in% c('Woodland/Shrubland', 'Temperate forest'),
si_biome == 'HR',
pl_sens_meth metadata = sfn_metadata
)
temperate_hr
#> [1] "ARG_MAZ" "ARG_TRE"
Remember thet you can get all the info from a metadata variable with
describe_md_variable
, and also you can get a complete list
of metadata variables for filtering with
sfn_vars_to_filter
:
sfn_vars_to_filter()
#> $site_md
#> [1] "si_name" "si_country"
#> [3] "si_contact_firstname" "si_contact_lastname"
#> [5] "si_contact_email" "si_contact_institution"
#> [7] "si_addcontr_firstname" "si_addcontr_lastname"
#> [9] "si_addcontr_email" "si_addcontr_institution"
#> [11] "si_lat" "si_long"
#> [13] "si_elev" "si_paper"
#> [15] "si_dist_mgmt" "si_igbp"
#> [17] "si_flux_network" "si_dendro_network"
#> [19] "si_remarks" "si_code"
#> [21] "si_mat" "si_map"
#> [23] "si_biome"
#>
#> $stand_md
#> [1] "st_name" "st_growth_condition" "st_treatment"
#> [4] "st_age" "st_height" "st_density"
#> [7] "st_basal_area" "st_lai" "st_aspect"
#> [10] "st_terrain" "st_soil_depth" "st_soil_texture"
#> [13] "st_sand_perc" "st_silt_perc" "st_clay_perc"
#> [16] "st_remarks" "st_USDA_soil_texture"
#>
#> $species_md
#> [1] "sp_name" "sp_ntrees" "sp_leaf_habit"
#> [4] "sp_basal_area_perc"
#>
#> $plant_md
#> [1] "pl_name" "pl_species" "pl_treatment"
#> [4] "pl_dbh" "pl_height" "pl_age"
#> [7] "pl_social" "pl_sapw_area" "pl_sapw_depth"
#> [10] "pl_bark_thick" "pl_leaf_area" "pl_sens_meth"
#> [13] "pl_sens_man" "pl_sens_cor_grad" "pl_sens_cor_zero"
#> [16] "pl_sens_calib" "pl_sap_units" "pl_sap_units_orig"
#> [19] "pl_sens_length" "pl_sens_hgt" "pl_sens_timestep"
#> [22] "pl_radial_int" "pl_azimut_int" "pl_remarks"
#> [25] "pl_code"
#>
#> $env_md
#> [1] "env_time_zone" "env_time_daylight" "env_timestep"
#> [4] "env_ta" "env_rh" "env_vpd"
#> [7] "env_sw_in" "env_ppfd_in" "env_netrad"
#> [10] "env_ws" "env_precip" "env_swc_shallow_depth"
#> [13] "env_swc_deep_depth" "env_plant_watpot" "env_leafarea_seasonal"
#> [16] "env_remarks"
# and see what values we must use for filtering by pl_sens_meth
describe_md_variable('pl_sens_meth')
#> Description:
#> Sap flow measures method
#>
#> Values:
#> CAG | HD | CHP | CHD | HFD | HPTM | HR | SFPLUS | SHB | TSHB | Other/unknown |
#>
#>
#> Units:
#> Fixed values
#>
#> Type:
#> Character
We can load all temperate sites with a simple pipe
<- temperate %>%
temperate_sites read_sfn_data(folder = 'RData/plant')
temperate_sites
#> sfn_data_multi object
#> 3 sites: ARG_MAZ ARG_TRE AUS_CAN_ST2_MIX
#> Approximate time span (UTC) for the combined sites: 2006-06-20 10:54:42 UTC--2009-11-18 22:24:18 UTC
This creates an sfn_data_multi
object, which is just a
list, but adapted to contain sfn_data
objects. Main
functions of sapfluxnetr
work with this type of objects. As
in any list, we can access the sites:
# the following is the same as temperate_sites[[3]] or
# temperate_sites$AUS_CAN_ST2_MIX
'AUS_CAN_ST2_MIX']]
temperate_sites[[#> sfn_data object
#> Data from AUS_CAN_ST2_MIX site
#>
#> Data kindly provided by David Forrester from
#> Swiss Federal Institute for Forest, Snow and Landscape Research WSL
#>
#> Site related literature: 10.1016/j.foreco.2009.07.036
#>
#> Sapflow data: 17808 observations of 34 trees/plants
#> Species present: Eucalyptus globulus, Acacia mearnsii
#>
#> Environmental data: 17808 observations.
#> Variables present:
#> ta vpd sw_in ws precip ppfd_in rh ext_rad
#>
#> Biome: Woodland/Shrubland
#>
#> TIMESTAMP span: 2006-06-20 11:00:00 +10--2007-06-26 10:30:00 +10
#>
#> Solar TIMESTAMP span: 2006-06-20 10:54:42 UTC--2007-06-26 10:23:32 UTC
#>
#> Sapflow data flags:
#> OUT_REMOVED NA_ADDED OUT_WARN NA_PRESENT
#> 40 3400 5460 504952
#>
#> Environmental data flags:
#> OUT_WARN NA_ADDED CALCULATED
#> 10826 15320 53424
Now we can aggregate all sites at once
<- temperate_sites %>%
temperate_aggregated daily_metrics()
#>
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#>
#> set_names
#> [1] "Crunching data for ARG_MAZ. In large datasets this could take a while"
#> [1] "General data for ARG_MAZ"
#> [1] "Crunching data for ARG_TRE. In large datasets this could take a while"
#> [1] "General data for ARG_TRE"
#> [1] "Crunching data for AUS_CAN_ST2_MIX. In large datasets this could take a while"
#> [1] "General data for AUS_CAN_ST2_MIX"
names(temperate_aggregated)
#> [1] "ARG_MAZ" "ARG_TRE" "AUS_CAN_ST2_MIX"
and voilà, all sites aggregated and stored in a list, so we can start working with the data.
Most of the times it comes in handy to have a tidy dataset with all
the sap flow and environmental measurements, along with the metadata for
the sites aggregated, in order to work with the data in an easier way.
This can be done with the metrics_tidyfier
function (see
?metrics_tidyfier
). But daily_metrics
and
related functions also implement the tidy
argument. For the
tidy
argument to work, we must have available the metadata,
see the “metadata section” for more
details about this.
This allows for returning directly a metrics data frame combining all sites aggregated along with their metadata, saving one step in the workflow:
<- temperate_sites %>%
temperate_tidy daily_metrics(tidy = TRUE, metadata = sfn_metadata)
#> [1] "Crunching data for ARG_MAZ. In large datasets this could take a while"
#> [1] "General data for ARG_MAZ"
#> [1] "Crunching data for ARG_TRE. In large datasets this could take a while"
#> [1] "General data for ARG_TRE"
#> [1] "Crunching data for AUS_CAN_ST2_MIX. In large datasets this could take a while"
#> [1] "General data for AUS_CAN_ST2_MIX"
temperate_tidy#> # A tibble: 12,769 × 129
#> TIMESTAMP si_code pl_code sapfl…¹ sapfl…² sapfl…³ sapfl…⁴ sapfl…⁵
#> <dttm> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2006-06-20 00:00:00 AUS_CAN_… AUS_CA… NaN NA 0 NA NaN
#> 2 2006-06-20 00:00:00 AUS_CAN_… AUS_CA… NaN NA 0 NA NaN
#> 3 2006-06-20 00:00:00 AUS_CAN_… AUS_CA… NaN NA 0 NA NaN
#> 4 2006-06-20 00:00:00 AUS_CAN_… AUS_CA… NaN NA 0 NA NaN
#> 5 2006-06-20 00:00:00 AUS_CAN_… AUS_CA… NaN NA 0 NA NaN
#> 6 2006-06-20 00:00:00 AUS_CAN_… AUS_CA… NaN NA 0 NA NaN
#> 7 2006-06-20 00:00:00 AUS_CAN_… AUS_CA… NaN NA 0 NA NaN
#> 8 2006-06-20 00:00:00 AUS_CAN_… AUS_CA… NaN NA 0 NA NaN
#> 9 2006-06-20 00:00:00 AUS_CAN_… AUS_CA… NaN NA 0 NA NaN
#> 10 2006-06-20 00:00:00 AUS_CAN_… AUS_CA… NaN NA 0 NA NaN
#> # … with 12,759 more rows, 121 more variables: ta_mean <dbl>, rh_mean <dbl>,
#> # vpd_mean <dbl>, sw_in_mean <dbl>, ws_mean <dbl>, precip_mean <dbl>,
#> # swc_shallow_mean <dbl>, ppfd_in_mean <dbl>, ext_rad_mean <dbl>,
#> # ta_sd <dbl>, rh_sd <dbl>, vpd_sd <dbl>, sw_in_sd <dbl>, ws_sd <dbl>,
#> # precip_sd <dbl>, swc_shallow_sd <dbl>, ppfd_in_sd <dbl>, ext_rad_sd <dbl>,
#> # ta_coverage <dbl>, rh_coverage <dbl>, vpd_coverage <dbl>,
#> # sw_in_coverage <dbl>, ws_coverage <dbl>, precip_coverage <dbl>, …
Now we can start analyzing, modeling, etc.
For example, to look for site effects in the relationship between sap
flow and vpd, using the 0.95 quantile as maximum values of sap flow and
the plant sapwood area as a third variable:
ggplot(
temperate_tidy,aes(x = vpd_mean, y = sapflow_q_95, colour = si_code, size = pl_sapw_area)
+
) geom_point(alpha = 0.2)
#> Warning: Removed 10966 rows containing missing values (`geom_point()`).