An implementation of common higher order functions with syntactic sugar for anonymous function. Provides also a link to 'dplyr' and 'data.table' for common transformations on data frames to work around non standard evaluation by default.
remotes::install_github("wahani/dat")
install.packages("dat")
R CMD check
. And you don't like that.dplyr
is not respecting the class of the object it operates on; the class
attribute changes on-the-fly.dplyr
nor data.table
are playing nice with S4, but you really,
really want a S4 data.table or tbl_df.rlist
and purrr
.The examples are from the introductory vignette of dplyr
. You still work with
data frames: so you can simply mix in dplyr features whenever you need them.
library("nycflights13")
library("dat")
## To use dplyr as backend set 'options(dat.use.dplyr = TRUE)'.
##
## Attaching package: 'dat'
## The following object is masked from 'package:base':
##
## replace
We can use mutar
to select rows. When you
reference a variable in the data frame, you can indicate this by using a one
sided formula.
mutar(flights, ~ month == 1 & day == 1)
mutar(flights, ~ 1:10)
And for sorting:
mutar(flights, ~ order(year, month, day))
## # A tibble: 336,776 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## 7 2013 1 1 555 600 -5 913 854
## 8 2013 1 1 557 600 -3 709 723
## 9 2013 1 1 557 600 -3 838 846
## 10 2013 1 1 558 600 -2 753 745
## # … with 336,766 more rows, and 11 more variables: arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
You can use characters, logicals, regular expressions and functions to select columns. Regular expressions are indicated by a leading “”.
flights %>%
extract(c("year", "month", "day")) %>%
extract("^day$") %>%
extract(is.numeric)
The main difference between dplyr::mutate
and mutar
is that you use a ~
instead of =
.
mutar(
flights,
gain ~ arr_delay - dep_delay,
speed ~ distance / air_time * 60
)
Grouping data is handled within mutar
:
mutar(flights, n ~ .N, by = "month")
mutar(flights, delay ~ mean(dep_delay, na.rm = TRUE), by = "month")
You can also provide additional arguments to a formula. This is especially helpful when you want to pass arguments from a function to such expressions. The additional augmentation can be anything which you can use to select columns (character, regular expression, function) or a named list where each element is a character.
mutar(
flights,
.n ~ mean(.n, na.rm = TRUE) | "^.*delay$",
.x ~ mean(.x, na.rm = TRUE) | list(.x = "arr_time"),
by = "month"
)
## # A tibble: 336,776 x 19
## year month day dep_time sched_dep_time sched_arr_time carrier flight
## <int> <int> <int> <int> <int> <int> <chr> <int>
## 1 2013 1 1 517 515 819 UA 1545
## 2 2013 1 1 533 529 830 UA 1714
## 3 2013 1 1 542 540 850 AA 1141
## 4 2013 1 1 544 545 1022 B6 725
## 5 2013 1 1 554 600 837 DL 461
## 6 2013 1 1 554 558 728 UA 1696
## 7 2013 1 1 555 600 854 B6 507
## 8 2013 1 1 557 600 723 EV 5708
## 9 2013 1 1 557 600 846 B6 79
## 10 2013 1 1 558 600 745 AA 301
## # … with 336,766 more rows, and 11 more variables: tailnum <chr>, origin <chr>,
## # dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
## # time_hour <dttm>, dep_delay <dbl>, arr_delay <dbl>, arr_time <dbl>
Using this package you can create S4 classes to contain a data frame (or a
data.table) and use the interface to dplyr
. Both dplyr
and data.table
do
not support integration with S4. The main function here is mutar
which is
generic enough to link to subsetting of rows and cols as well as mutate and
summarise. In the background dplyr
s ability to work on a data.table
is being
used.
library("data.table")
setClass("DataTable", "data.table")
DataTable <- function(...) {
new("DataTable", data.table::data.table(...))
}
setMethod("[", "DataTable", mutar)
dtflights <- do.call(DataTable, nycflights13::flights)
dtflights[1:10, c("year", "month", "day")]
dtflights[n ~ .N, by = "month"]
dtflights[n ~ .N, sby = "month"]
dtflights %>%
filtar(~month > 6) %>%
mutar(n ~ .N, by = "month") %>%
sumar(n ~ data.table::first(n), by = "month")
Inspired by rlist
and purrr
some low level operations on vectors are
supported. The aim here is to integrate syntactic sugar for anonymous functions.
Furthermore the functions should support the use of pipes.
map
and flatmap
as replacements for the apply functionsextract
for subsettingreplace
for replacing elements in a vectorWhat we can do with map:
map(1:3, ~ .^2)
flatmap(1:3, ~ .^2)
map(1:3 ~ 11:13, c) # zip
dat <- data.frame(x = 1, y = "")
map(dat, x ~ x + 1, is.numeric)
What we can do with extract:
extract(1:10, ~ . %% 2 == 0) %>% sum
extract(1:15, ~ 15 %% . == 0)
l <- list(aList = list(x = 1), aAtomic = "hi")
extract(l, "^aL")
extract(l, is.atomic)
What we can do with replace:
replace(c(1, 2, NA), is.na, 0)
replace(c(1, 2, NA), rep(TRUE, 3), 0)
replace(c(1, 2, NA), 3, 0)
replace(list(x = 1, y = 2), "x", 0)
replace(list(x = 1, y = 2), "^x$", 0)
replace(list(x = 1, y = "a"), is.character, NULL)