Dealing with Timezones

library(ARUtools)
library(lubridate) # For working with date/times
library(dplyr) # For manipulating data

Timezones can cause a lot of confusion, but, unfortunately are important descriptors of a moment in time.

To deal with timezones in ARUtools you only need two things:

To explain this in more detail, let’s talk about how ARUtools treats timezones.

In R, a date/time column can only have one timezone specified. However, when working with sites around a timezone divide, it’s possible that you may occasionally have recordings made in different timezones which you would like to process together.

To facilitate this we use the convention of ‘local’ times marked with “UTC”. Here we mean ‘local’ to reflect the timezone that the ARU is recording in. This means that the date_time column may contain a time recorded in Eastern Daylight savings, but the ‘official’ timezone according to R is still UTC.

If we were to try to use non-UTC times, we’ll be warned off.

To illustrate this, let’s create a cleaned sites index data frame with the date_times in the America/Toronto timezone.

s <- example_sites_clean

# Force to a non-UTC timezones
s$date_time_start <- force_tz(s$date_time_start, "America/Toronto")
s$date_time_end <- force_tz(s$date_time_end, "America/Toronto")

If we try to add these sites to our cleaned metadata we can see that the timezones are removed

m <- clean_metadata(project_files = example_files)
#> Extracting ARU info...
#> Extracting Dates and Times...
m <- add_sites(m, s)
#> `date_time` columns are assumed to be in 'local' time marked with 'UTC'
#> • Removing timezone specification
#> Joining by columns `date_time_start` and `date_time_end`

What timezone are my data in?

If we ask R what timezone these data are in, R will say “UTC”

tz(m$date_time)
#> [1] "UTC"

But that’s probably not really the case.

There are three possible options for what the timezones might look like:

  1. All times are in the same timezone that was programmed into the ARUs before they were set out. This would likely be either the timezone of the region to which they were being deployed, or the timezone of the lab or home base (it doesn’t really matter which, as long as they’re all the same and you know which one it is).

  2. There are several different timezones among these recordings, which correspond to where they were deployed. This would likely happen if ARUs were set to use GPS to get the timezone and if a study area straddled a timezone boundary.

  3. There are several different timezones among these recordings, but they do not correspond to where they were deployed. This might happen if the timezones were set on the ARUs for different projects and were not corrected before deployment.

In ARUtools, we have options to deal withe the first two scenarios. However, if you find yourself in the third scenario, the best thing would be to split your files by timezone and run through the workflow individually with each batch.

Calculating time to sunrise/sunset

For simplicity, we don’t need to worry about the ‘real’ timezone except for when we calculate the time to sunrise/sunset.

This is where it’s important to know what timezone patterns you have in your data.

In our first scenario, we know our our recordings all have the same timezone and we know what that timezone is.

Here we can specify that timezone specifically:

m_est <- calc_sun(m, aru_tz = "America/Toronto")

Alternatively, in our second scenario, we know that the timezones may be different, but importantly, that they correspond to the location where the unit was deployed. Here we can use aru_tz = "local" and calc_sun() will use the recording coordinates to figure out what the timezone was.

m_local <- calc_sun(m, aru_tz = "local")

Finally, in our final scenario, we know what the timezones are, but they are not all the same and they do not correspond to the location where the unit was deployed.

In this case we’ll split the data and use the specific timezones. Let’s assume that we know the timezones and that sites P06_1 and P09_1 are in Central, and the rest in Eastern.

# Split by timezone
m1 <- filter(m, site_id %in% c("P06_1", "P09_1")) # Get P06_1 and P09_1
m2 <- filter(m, !site_id %in% c("P06_1", "P09_1")) # Get all except the above

# Calculate time to sunrise/sunset individually
m1_cst <- calc_sun(m1, aru_tz = "America/Winnipeg")
m2_est <- calc_sun(m2, aru_tz = "America/Toronto")

# Join them back in
m_joint <- bind_rows(m1_cst, m2_est)

Because we actually use the same timezone that the sites were located in, if you compare m_joint to m_local you’ll see that with the exception of what the timezone is called they have the same results (“America/Detroit” is the same timezone as “America/Toronto”).

You’ll also note that those with the Eastern timezone (America/Toronto or America/Detroit), all match those in m_est.

Important things to note

An example

Let’s assume we have two sites, one in the Eastern timezone, one in Western. However, they are both programmed to record at 4am, 5am and 6am Eastern.

We’ll first use some of our example data to create this mini meta data set.

m_mini <- filter(m, site_id %in% c("P01_1", "P06_1")) |>
  select(aru_id, site_id, longitude, latitude) |>
  distinct() |>
  cross_join(data.frame(date_time = c(
    "2020-05-02 05:00:00",
    "2020-05-02 06:00:00",
    "2020-05-02 07:00:00"
  ))) |>
  mutate(
    date = as_date(date_time),
    path = paste0(aru_id, "_", site_id, "_", hour(date_time), ".csv")
  )

If we now calculate the time to sunrise/sunset (t2sr and t2ss) we find that the difference between these sites is about 15min, accounting for the fact that site P06_1 is farther west than P01_1 and so the recording at 6am occurs 28.8min before sunrise, whereas P01_1’s 6am recording occurs only 14.9 min before sunrise.

calc_sun(m_mini, aru_tz = "America/Toronto") |>
  arrange(date_time)
#> # A tibble: 6 × 10
#>   aru_id   site_id longitude latitude date_time           date       path  tz   
#>   <chr>    <chr>       <dbl>    <dbl> <dttm>              <date>     <chr> <chr>
#> 1 BARLT10… P01_1       -85.0     50.0 2020-05-02 05:00:00 2020-05-02 BARL… Amer…
#> 2 BARLT10… P06_1       -90.1     52   2020-05-02 05:00:00 2020-05-02 BARL… Amer…
#> 3 BARLT10… P01_1       -85.0     50.0 2020-05-02 06:00:00 2020-05-02 BARL… Amer…
#> 4 BARLT10… P06_1       -90.1     52   2020-05-02 06:00:00 2020-05-02 BARL… Amer…
#> # ℹ 2 more rows
#> # ℹ 2 more variables: t2sr <dbl>, t2ss <dbl>

However, if we were to incorrectly assume that the ARU unit located in the central timezone was recording in that timezone, we would get very different results.

calc_sun(m_mini, aru_tz = "local") |>
  arrange(date_time)
#> # A tibble: 6 × 10
#>   aru_id   site_id longitude latitude date_time           date       path  tz   
#>   <chr>    <chr>       <dbl>    <dbl> <dttm>              <date>     <chr> <chr>
#> 1 BARLT10… P01_1       -85.0     50.0 2020-05-02 05:00:00 2020-05-02 BARL… Amer…
#> 2 BARLT10… P06_1       -90.1     52   2020-05-02 05:00:00 2020-05-02 BARL… Amer…
#> 3 BARLT10… P01_1       -85.0     50.0 2020-05-02 06:00:00 2020-05-02 BARL… Amer…
#> 4 BARLT10… P06_1       -90.1     52   2020-05-02 06:00:00 2020-05-02 BARL… Amer…
#> # ℹ 2 more rows
#> # ℹ 2 more variables: t2sr <dbl>, t2ss <dbl>

Here all the times to sunrise/sunset for site P06_1 are offset by an hour, because we’re assuming the wrong timezone (which is an hour different from the correct one).

Therefore the take home is that you only need two things: