tidyrgee brings components of dplyr’s syntax to remote sensing analysis, using the rgee package.
rgee is an R-API for the Google Earth Engine (GEE)
which provides R support to the methods/functions available in the
JavaScript code editor and python API. The rgee
syntax was
written to be very similar to the GEE Javascript/python. However, this
syntax can feel unnatural and difficult at times especially to users
with less experience in GEE. Simple concepts that are easy express
verbally can be cumbersome even to advanced users (see Syntax
Comparison). The tidyverse
has provided principals
and concepts that help data scientists/R-users efficiently write and
communicate there code in a clear and concise manner.
tidyrgee
aims to bring these principals to GEE-remote
sensing analyses.
tidyrgee provides the convenience of pipe-able dplyr style methods
such as filter
, group_by
,
summarise
, select
,mutate
,etc.
using rlang’s style of
non-standard evaluation (NSE)
try it out!
You can install the development version of tidyrgee from GitHub with:
# install.packages("devtools")
::install_github("r-tidy-remote-sensing/tidyrgee") devtools
It is important to note that to use tidyrgee you must be signed up for a GEE developer account. Additionally you must install the rgee package following there installation and setup instructions here
Below is a quick example demonstrating the simplified syntax. Note
that the rgee
syntax is very similar to the syntax in the
Javascript code editor. In this example I want to simply calculate mean
monthly NDVI (per pixel) for every year from 2000-2015. This is clearly
a fairly simple analysis to verbalize/conceptualize. Yet, using using
standard GEE conventions, the process is not so simple. Aside, from many
peculiarities such as flattening
a list and then calling
and then rebuilding the imageCollection
at the end, I also
have to write and think about a double mapping
statement using months and years (sort of like a double for-loop). By
comparison the tidyrgee syntax removes the complexity and allows me to
write the code in a more human readable/interpretable format.
rgee (similar to Javascript) | tidyrgee |
---|---|
|
|
Below are a couple examples showing some of the available functions.
To load images/imageCollections you follow the standard approach
using rgee
:
ee$ImageCollection
/ ee$Image
library(tidyrgee)
library(rgee)
ee_Initialize(quiet = T)
<- ee$ImageCollection("MODIS/006/MOD13Q1") modis_ic
Once the above steps are performed you can convert the
ee$ImageCollection
to a tidyee
object with the
function as_tidyee
. The tidyee object stores the original
ee$ImageCollection
as ee_ob
(for earth engine
object) and produces as virtual table/data.frame stored as
vrt
. This vrt not only facilitates the use of
dplyr/tidyverse methods, but also allows the user to better view the
data stored in the accompanying imageCollection. The ee_ob
and vrt
inside the tidyee object are linked, any function
applied to the tidyee object will apply to them both so that they remain
in sync.
<- as_tidyee(modis_ic) modis_tidy
the vrt
comes with a few built in columns which you can
use off the bat for filtering and grouping, but you can also
mutate
additional info for filtering and grouping (i.e
using lubridate
to create new temporal groupings)
::kable(modis_tidy$vrt |> head()) knitr
id | time_start | system_index | date | month | year | doy | band_names |
---|---|---|---|---|---|---|---|
MODIS/006/MOD13Q1/2000_02_18 | 2000-02-18 | 2000_02_18 | 2000-02-18 | 2 | 2000 | 49 | NDVI , EVI , DetailedQA , sur_refl_b01 , sur_refl_b02 , sur_refl_b03 , sur_refl_b07 , ViewZenith , SolarZenith , RelativeAzimuth, DayOfYear , SummaryQA |
MODIS/006/MOD13Q1/2000_03_05 | 2000-03-05 | 2000_03_05 | 2000-03-05 | 3 | 2000 | 65 | NDVI , EVI , DetailedQA , sur_refl_b01 , sur_refl_b02 , sur_refl_b03 , sur_refl_b07 , ViewZenith , SolarZenith , RelativeAzimuth, DayOfYear , SummaryQA |
MODIS/006/MOD13Q1/2000_03_21 | 2000-03-21 | 2000_03_21 | 2000-03-21 | 3 | 2000 | 81 | NDVI , EVI , DetailedQA , sur_refl_b01 , sur_refl_b02 , sur_refl_b03 , sur_refl_b07 , ViewZenith , SolarZenith , RelativeAzimuth, DayOfYear , SummaryQA |
MODIS/006/MOD13Q1/2000_04_06 | 2000-04-06 | 2000_04_06 | 2000-04-06 | 4 | 2000 | 97 | NDVI , EVI , DetailedQA , sur_refl_b01 , sur_refl_b02 , sur_refl_b03 , sur_refl_b07 , ViewZenith , SolarZenith , RelativeAzimuth, DayOfYear , SummaryQA |
MODIS/006/MOD13Q1/2000_04_22 | 2000-04-22 | 2000_04_22 | 2000-04-22 | 4 | 2000 | 113 | NDVI , EVI , DetailedQA , sur_refl_b01 , sur_refl_b02 , sur_refl_b03 , sur_refl_b07 , ViewZenith , SolarZenith , RelativeAzimuth, DayOfYear , SummaryQA |
MODIS/006/MOD13Q1/2000_05_08 | 2000-05-08 | 2000_05_08 | 2000-05-08 | 5 | 2000 | 129 | NDVI , EVI , DetailedQA , sur_refl_b01 , sur_refl_b02 , sur_refl_b03 , sur_refl_b07 , ViewZenith , SolarZenith , RelativeAzimuth, DayOfYear , SummaryQA |
Next we demonstrate filtering by date, month, and year. The
vrt
and ee_ob
are always filtered together
|>
modis_tidy filter(date>="2021-06-01")
#> band names: [ NDVI, EVI, DetailedQA, sur_refl_b01, sur_refl_b02, sur_refl_b03, sur_refl_b07, ViewZenith, SolarZenith, RelativeAzimuth, DayOfYear, SummaryQA ]
#>
#> $ee_ob
#> EarthEngine Object: ImageCollection
#> $vrt
#> # A tibble: 28 x 9
#> id time_start syste~1 date month year doy band_~2
#> <chr> <dttm> <chr> <date> <dbl> <dbl> <dbl> <list>
#> 1 MODIS/006/M~ 2021-06-10 00:00:00 2021_0~ 2021-06-10 6 2021 161 <chr>
#> 2 MODIS/006/M~ 2021-06-26 00:00:00 2021_0~ 2021-06-26 6 2021 177 <chr>
#> 3 MODIS/006/M~ 2021-07-12 00:00:00 2021_0~ 2021-07-12 7 2021 193 <chr>
#> 4 MODIS/006/M~ 2021-07-28 00:00:00 2021_0~ 2021-07-28 7 2021 209 <chr>
#> 5 MODIS/006/M~ 2021-08-13 00:00:00 2021_0~ 2021-08-13 8 2021 225 <chr>
#> 6 MODIS/006/M~ 2021-08-29 00:00:00 2021_0~ 2021-08-29 8 2021 241 <chr>
#> 7 MODIS/006/M~ 2021-09-14 00:00:00 2021_0~ 2021-09-14 9 2021 257 <chr>
#> 8 MODIS/006/M~ 2021-09-30 00:00:00 2021_0~ 2021-09-30 9 2021 273 <chr>
#> 9 MODIS/006/M~ 2021-10-16 00:00:00 2021_1~ 2021-10-16 10 2021 289 <chr>
#> 10 MODIS/006/M~ 2021-11-01 00:00:00 2021_1~ 2021-11-01 11 2021 305 <chr>
#> # ... with 18 more rows, 1 more variable: tidyee_index <chr>, and abbreviated
#> # variable names 1: system_index, 2: band_names
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
#>
#> attr(,"class")
#> [1] "tidyee"
|>
modis_tidy filter(year%in% 2010:2011)
#> band names: [ NDVI, EVI, DetailedQA, sur_refl_b01, sur_refl_b02, sur_refl_b03, sur_refl_b07, ViewZenith, SolarZenith, RelativeAzimuth, DayOfYear, SummaryQA ]
#>
#> $ee_ob
#> EarthEngine Object: ImageCollection
#> $vrt
#> # A tibble: 46 x 9
#> id time_start syste~1 date month year doy band_~2
#> <chr> <dttm> <chr> <date> <dbl> <dbl> <dbl> <list>
#> 1 MODIS/006/M~ 2010-01-01 00:00:00 2010_0~ 2010-01-01 1 2010 1 <chr>
#> 2 MODIS/006/M~ 2010-01-17 00:00:00 2010_0~ 2010-01-17 1 2010 17 <chr>
#> 3 MODIS/006/M~ 2010-02-02 00:00:00 2010_0~ 2010-02-02 2 2010 33 <chr>
#> 4 MODIS/006/M~ 2010-02-18 00:00:00 2010_0~ 2010-02-18 2 2010 49 <chr>
#> 5 MODIS/006/M~ 2010-03-06 00:00:00 2010_0~ 2010-03-06 3 2010 65 <chr>
#> 6 MODIS/006/M~ 2010-03-22 00:00:00 2010_0~ 2010-03-22 3 2010 81 <chr>
#> 7 MODIS/006/M~ 2010-04-07 00:00:00 2010_0~ 2010-04-07 4 2010 97 <chr>
#> 8 MODIS/006/M~ 2010-04-23 00:00:00 2010_0~ 2010-04-23 4 2010 113 <chr>
#> 9 MODIS/006/M~ 2010-05-09 00:00:00 2010_0~ 2010-05-09 5 2010 129 <chr>
#> 10 MODIS/006/M~ 2010-05-25 00:00:00 2010_0~ 2010-05-25 5 2010 145 <chr>
#> # ... with 36 more rows, 1 more variable: tidyee_index <chr>, and abbreviated
#> # variable names 1: system_index, 2: band_names
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
#>
#> attr(,"class")
#> [1] "tidyee"
|>
modis_tidy filter(month%in% c(7,8))
#> band names: [ NDVI, EVI, DetailedQA, sur_refl_b01, sur_refl_b02, sur_refl_b03, sur_refl_b07, ViewZenith, SolarZenith, RelativeAzimuth, DayOfYear, SummaryQA ]
#>
#> $ee_ob
#> EarthEngine Object: ImageCollection
#> $vrt
#> # A tibble: 91 x 9
#> id time_start syste~1 date month year doy band_~2
#> <chr> <dttm> <chr> <date> <dbl> <dbl> <dbl> <list>
#> 1 MODIS/006/M~ 2000-07-11 00:00:00 2000_0~ 2000-07-11 7 2000 193 <chr>
#> 2 MODIS/006/M~ 2000-07-27 00:00:00 2000_0~ 2000-07-27 7 2000 209 <chr>
#> 3 MODIS/006/M~ 2000-08-12 00:00:00 2000_0~ 2000-08-12 8 2000 225 <chr>
#> 4 MODIS/006/M~ 2000-08-28 00:00:00 2000_0~ 2000-08-28 8 2000 241 <chr>
#> 5 MODIS/006/M~ 2001-07-12 00:00:00 2001_0~ 2001-07-12 7 2001 193 <chr>
#> 6 MODIS/006/M~ 2001-07-28 00:00:00 2001_0~ 2001-07-28 7 2001 209 <chr>
#> 7 MODIS/006/M~ 2001-08-13 00:00:00 2001_0~ 2001-08-13 8 2001 225 <chr>
#> 8 MODIS/006/M~ 2001-08-29 00:00:00 2001_0~ 2001-08-29 8 2001 241 <chr>
#> 9 MODIS/006/M~ 2002-07-12 00:00:00 2002_0~ 2002-07-12 7 2002 193 <chr>
#> 10 MODIS/006/M~ 2002-07-28 00:00:00 2002_0~ 2002-07-28 7 2002 209 <chr>
#> # ... with 81 more rows, 1 more variable: tidyee_index <chr>, and abbreviated
#> # variable names 1: system_index, 2: band_names
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
#>
#> attr(,"class")
#> [1] "tidyee"
In this next example we pipe together multiple functions
(select
, filter
, group_by
,
summarise
) to
NDVI
band from the ImageCollectionThe result will be an ImageCollection
with the one
Image
per month (12 images) where each pixel in each image
represents the average NDVI value for that month calculated using
monthly data from 2000 2015.
|>
modis_tidy select("NDVI") |>
filter(year %in% 2000:2015) |>
group_by(month) |>
summarise(stat= "mean")
#> band names: [ NDVI_mean ]
#>
#> $ee_ob
#> EarthEngine Object: ImageCollection
#> $vrt
#> # A tibble: 12 x 7
#> month dates_summ~1 numbe~2 time_start time_end date
#> <dbl> <list> <int> <dttm> <dttm> <date>
#> 1 1 <dttm [30]> 30 2001-01-01 00:00:00 2001-01-01 00:00:00 2001-01-01
#> 2 2 <dttm [31]> 31 2000-02-18 00:00:00 2000-02-18 00:00:00 2000-02-18
#> 3 3 <dttm [32]> 32 2000-03-05 00:00:00 2000-03-05 00:00:00 2000-03-05
#> 4 4 <dttm [32]> 32 2000-04-06 00:00:00 2000-04-06 00:00:00 2000-04-06
#> 5 5 <dttm [32]> 32 2000-05-08 00:00:00 2000-05-08 00:00:00 2000-05-08
#> 6 6 <dttm [32]> 32 2000-06-09 00:00:00 2000-06-09 00:00:00 2000-06-09
#> 7 7 <dttm [32]> 32 2000-07-11 00:00:00 2000-07-11 00:00:00 2000-07-11
#> 8 8 <dttm [32]> 32 2000-08-12 00:00:00 2000-08-12 00:00:00 2000-08-12
#> 9 9 <dttm [32]> 32 2000-09-13 00:00:00 2000-09-13 00:00:00 2000-09-13
#> 10 10 <dttm [20]> 20 2000-10-15 00:00:00 2000-10-15 00:00:00 2000-10-15
#> 11 11 <dttm [28]> 28 2000-11-16 00:00:00 2000-11-16 00:00:00 2000-11-16
#> 12 12 <dttm [32]> 32 2000-12-02 00:00:00 2000-12-02 00:00:00 2000-12-02
#> # ... with 1 more variable: band_names <list>, and abbreviated variable names
#> # 1: dates_summarised, 2: number_images
#> # i Use `colnames()` to see all variable names
#>
#> attr(,"class")
#> [1] "tidyee"
You can easily group_by
more than 1 property to
calculate different summary stats. Below we
As we are using the MODIS 16-day composite we summarising
approximately 2 images per month to create median composite image fo
reach month in the specified years. The vrt
holds a
list-col
containing all the dates summarised per new
composite image.
|>
modis_tidy select("NDVI") |>
filter(year %in% 2021:2022) |>
group_by(year,month) |>
summarise(stat= "median")
#> band names: [ NDVI_median ]
#>
#> $ee_ob
#> EarthEngine Object: ImageCollection
#> $vrt
#> # A tibble: 20 x 8
#> year month dates_summarised number~1 time_start time_end
#> <dbl> <dbl> <list> <int> <dttm> <dttm>
#> 1 2021 1 <dttm [2]> 2 2021-01-01 00:00:00 2021-01-01 00:00:00
#> 2 2021 2 <dttm [2]> 2 2021-02-02 00:00:00 2021-02-02 00:00:00
#> 3 2021 3 <dttm [2]> 2 2021-03-06 00:00:00 2021-03-06 00:00:00
#> 4 2021 4 <dttm [2]> 2 2021-04-07 00:00:00 2021-04-07 00:00:00
#> 5 2021 5 <dttm [2]> 2 2021-05-09 00:00:00 2021-05-09 00:00:00
#> 6 2021 6 <dttm [2]> 2 2021-06-10 00:00:00 2021-06-10 00:00:00
#> 7 2021 7 <dttm [2]> 2 2021-07-12 00:00:00 2021-07-12 00:00:00
#> 8 2021 8 <dttm [2]> 2 2021-08-13 00:00:00 2021-08-13 00:00:00
#> 9 2021 9 <dttm [2]> 2 2021-09-14 00:00:00 2021-09-14 00:00:00
#> 10 2021 10 <dttm [1]> 1 2021-10-16 00:00:00 2021-10-16 00:00:00
#> 11 2021 11 <dttm [2]> 2 2021-11-01 00:00:00 2021-11-01 00:00:00
#> 12 2021 12 <dttm [2]> 2 2021-12-03 00:00:00 2021-12-03 00:00:00
#> 13 2022 1 <dttm [2]> 2 2022-01-01 00:00:00 2022-01-01 00:00:00
#> 14 2022 2 <dttm [2]> 2 2022-02-02 00:00:00 2022-02-02 00:00:00
#> 15 2022 3 <dttm [2]> 2 2022-03-06 00:00:00 2022-03-06 00:00:00
#> 16 2022 4 <dttm [2]> 2 2022-04-07 00:00:00 2022-04-07 00:00:00
#> 17 2022 5 <dttm [2]> 2 2022-05-09 00:00:00 2022-05-09 00:00:00
#> 18 2022 6 <dttm [2]> 2 2022-06-10 00:00:00 2022-06-10 00:00:00
#> 19 2022 7 <dttm [2]> 2 2022-07-12 00:00:00 2022-07-12 00:00:00
#> 20 2022 8 <dttm [1]> 1 2022-08-13 00:00:00 2022-08-13 00:00:00
#> # ... with 2 more variables: date <date>, band_names <list>, and abbreviated
#> # variable name 1: number_images
#> # i Use `colnames()` to see all variable names
#>
#> attr(,"class")
#> [1] "tidyee"
To improve interoperability with rgee
we have included
the as_ee
function to return the tidyee
object
back to rgee
classes when necessary
<- modis_tidy |> as_ee() modis_ic