Often with time series you want to aggregate your dataset to a less
granular period. An example of this might be moving from a daily series
to a monthly series to look at broader trends in your data.
as_period()
allows you to do exactly this.
The period
argument in as_period()
for
specifying the transformation you want is a character with a general
format of "frequency period"
where frequency is a number
like 1 or 2, and period is an interval like weekly
or
yearly
. There must be a space between the two.
library(tibbletime)
library(dplyr)
# Facebook stock prices.
data(FB)
# Convert FB to tbl_time
<- as_tbl_time(FB, index = date)
FB
# FANG stock prices
data(FANG)
# Convert FANG to tbl_time and group
<- as_tbl_time(FANG, index = date) %>%
FANG group_by(symbol)
To see this in action, transform the daily FB
data set
to monthly data.
as_period(FB, '1 month')
## # A time tibble: 48 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-02-01 31.0 31.0 29.6 29.7 85856700 29.7
## 3 FB 2013-03-01 27.0 28.1 26.8 27.8 54064800 27.8
## 4 FB 2013-04-01 25.6 25.9 25.3 25.5 22249300 25.5
## 5 FB 2013-05-01 27.8 27.9 27.3 27.4 64567600 27.4
## 6 FB 2013-06-03 24.3 24.3 23.7 23.8 35733800 23.8
## 7 FB 2013-07-01 25.0 25.1 24.6 24.8 20582200 24.8
## 8 FB 2013-08-01 37.3 38.3 36.9 37.5 106066500 37.5
## 9 FB 2013-09-03 41.8 42.2 41.5 41.9 48774900 41.9
## 10 FB 2013-10-01 50.0 51.0 49.5 50.4 98114000 50.4
## # … with 38 more rows
# Additionally, the following are equivalent
# as_period(FB, 'month')
# as_period(FB, 'm')
# as_period(FB, '1 m')
You aren’t restricted to only 1 month periods. Maybe you wanted every 2 months?
as_period(FB, '2 m')
## # A time tibble: 24 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-03-01 27.0 28.1 26.8 27.8 54064800 27.8
## 3 FB 2013-05-01 27.8 27.9 27.3 27.4 64567600 27.4
## 4 FB 2013-07-01 25.0 25.1 24.6 24.8 20582200 24.8
## 5 FB 2013-09-03 41.8 42.2 41.5 41.9 48774900 41.9
## 6 FB 2013-11-01 50.8 52.1 49.7 49.8 95033000 49.8
## 7 FB 2014-01-02 54.8 55.2 54.2 54.7 43195500 54.7
## 8 FB 2014-03-03 67.0 68.1 66.5 67.4 56824100 67.4
## 9 FB 2014-05-01 60.4 62.3 60.2 61.2 82429000 61.2
## 10 FB 2014-07-01 67.6 68.4 67.4 68.1 33243000 68.1
## # … with 14 more rows
Or maybe every 25 days? Note that the dates do not line up exactly
with a difference of 25 days. This is due to the data set not being
completely regular (there are gaps due to weekends and holidays).
as_period()
chooses the first date it can find in the
period specified.
as_period(FB, '25 d')
## # A time tibble: 59 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-28 31.9 32.5 31.8 32.5 59682500 32.5
## 3 FB 2013-02-20 28.9 29.0 28.3 28.5 42098200 28.5
## 4 FB 2013-03-18 26.4 26.8 25.8 26.5 26653700 26.5
## 5 FB 2013-04-11 27.5 28.1 27.2 28.0 33368500 28.0
## 6 FB 2013-05-06 28.3 28.5 27.5 27.6 43939400 27.6
## 7 FB 2013-05-31 24.6 25.0 24.3 24.4 35925000 24.4
## 8 FB 2013-06-25 24.1 24.4 24.0 24.2 24713200 24.2
## 9 FB 2013-07-22 26.0 26.1 25.7 26.0 27526300 26.0
## 10 FB 2013-08-14 36.8 37.5 36.6 36.7 48423900 36.7
## # … with 49 more rows
start_date
argumentBy default, the date that starts the first group is calculated as:
Find the minimum date in your dataset.
Floor that date to the period that you specified.
In the 1 month example above, 2013-01-02
is the first
date in the series, and because “month” was chosen, the first group is
defined as (2013-01-01 to 2013-01-31).
Occasionally this is not what you want. Consider what would happen if
you changed the period to “every 2 days”. The first date is
2013-01-02
, but because “day” is chosen, this isn’t floored
to 2013-01-01
so the groups are (2013-01-02, 2013-01-03),
(2013-01-04, 2013-01-05) and so on. If you wanted the first group to be
(2013-01-01, 2013-01-02), you can use the start_date
argument.
# Without start_date
as_period(FB, '2 d')
## # A time tibble: 619 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 4 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
## 5 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
## 6 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 7 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
## 8 FB 2013-01-17 30.1 30.4 30.0 30.1 40256700 30.1
## 9 FB 2013-01-22 29.8 30.9 29.7 30.7 55243300 30.7
## 10 FB 2013-01-23 31.1 31.5 30.8 30.8 48899800 30.8
## # … with 609 more rows
# With start_date
as_period(FB, '2 d', start_date = "2013-01-01")
## # A time tibble: 619 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 4 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
## 5 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
## 6 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 7 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
## 8 FB 2013-01-17 30.1 30.4 30.0 30.1 40256700 30.1
## 9 FB 2013-01-22 29.8 30.9 29.7 30.7 55243300 30.7
## 10 FB 2013-01-23 31.1 31.5 30.8 30.8 48899800 30.8
## # … with 609 more rows
side
argumentBy default, the first date per period is returned. If you want the
end of each period instead, specify the side = "end"
argument.
as_period(FB, 'y', side = "end")
## # A time tibble: 4 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-12-31 54.1 54.9 53.9 54.7 43076200 54.7
## 2 FB 2014-12-31 79.5 79.8 77.9 78.0 19935400 78.0
## 3 FB 2015-12-31 106 106. 105. 105. 18298700 105.
## 4 FB 2016-12-30 117. 117. 115. 115. 18600100 115.
One of the neat things about working in the tidyverse
is
that these functions can also work with grouped datasets. Here we
transform the daily series of the 4 FANG stocks to a periodicity of
every 2 years.
%>%
FANG as_period('2 y')
## # A time tibble: 12 × 8
## # Index: date
## # Groups: symbol [4]
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2014-01-02 54.8 55.2 54.2 54.7 43195500 54.7
## 3 FB 2016-01-04 102. 102. 99.8 102. 37912400 102.
## 4 AMZN 2013-01-02 256. 258. 253. 257. 3271000 257.
## 5 AMZN 2014-01-02 399. 399. 394. 398. 2137800 398.
## 6 AMZN 2016-01-04 656. 658. 628. 637. 9314500 637.
## 7 NFLX 2013-01-02 95.2 95.8 90.7 92.0 19431300 13.1
## 8 NFLX 2014-01-02 367. 368. 361. 363. 12325600 51.8
## 9 NFLX 2016-01-04 109 110 105. 110. 20794800 110.
## 10 GOOG 2013-01-02 719. 727. 717. 723. 5101500 361.
## 11 GOOG 2014-01-02 1115. 1118. 1108. 1113. 3656400 556.
## 12 GOOG 2016-01-04 743 744. 731. 742. 3272800 742.