fastymd

Overview

fastymd is a package for working with Year-Month-Day (YMD) style date objects. It provides extremely fast passing of character strings and numeric values to date objects as well as fast decomposition of these in to their year, month and day components. The underlying algorithms follow the approach of Howard Hinnant for calculating days from the UNIX Epoch of Gregorian Calendar dates and vice versa.

The API won’t give any surprises:

library(fastymd)
cdate <- c("2025-04-16", "2025-04-17")
(res <- fymd(cdate))
#> [1] "2025-04-16" "2025-04-17"
res == as.Date(cdate)
#> [1] TRUE TRUE
get_ymd(res)
#>   year month day
#> 1 2025     4  16
#> 2 2025     4  17
fymd(2025, 4, 16) == res[1L]
#> [1] TRUE

Invalid dates will return NA and a warning:

fymd(2021, 02, 29) # not a leap year
#> NAs introduced due to invalid month and/or day combinations.
#> [1] NA

More interesting is the handling of output after a valid date. Consider the following timestamp:

timelt <- as.POSIXlt(Sys.time(), tz = "UTC")
(timestamp <- strftime(timelt, "%Y-%m-%dT%H:%M:%S%z"))
#> [1] "2025-10-03T08:50:17+0000"

By default the time element is ignored:

(res <- fymd(timestamp))
#> [1] "2025-10-03"
res == as.Date(timestamp, tz = "UTC")
#> [1] TRUE

This ignoring of the timestamp is both good and bad. For timestamps it makes perfect sense, but perhaps you have simple dates and a concern that some are corrupted. For these we can use the strict argument:

cdate <- "2025-04-16nonsense "
fymd(cdate)
#> [1] "2025-04-16"
fymd(cdate, strict = TRUE)
#> NAs introduced due to invalid date strings.
#> [1] NA

Benchmarks

The character method of fymd() parses input strings in a fixed, year, month and day order. These values must be digits but can be separated by any non-digit character. This is similar in spirit to the fastDate() function in Simon Urbanek’s fasttime package, using pure text parsing and no system calls for maximum speed.

For extremely fast passing of POSIX style timestamps you will struggle to beat the performance of fasttime. This works fantastically for timestamps that do not need validation and are within the date range supported by the package (currently 1970-01-01 through to the year 2199).

fymd() fills the, admittedly small, niche where you want fast parsing of YMD strings along with date validation and support for a wider range of dates from the Proleptic Gregorian calendar (currently we support years in the range [-9999, 9999]). This additional capability does come with a small performance penalty but, hopefully, this has been kept to a minimum and the implementation remains competitive.

library(microbenchmark)

# 1970-01-01 (UNIX epoch) to "2199-01-01"
dates <- seq.Date(from = .Date(0), to = fymd("2199-01-01"), by = "day")

# comparison timings for fymd (character method)
cdates  <- format(dates)
(res_c <- microbenchmark(
    fasttime  = fasttime::fastDate(cdates),
    fastymd   = fymd(cdates),
    ymd       = ymd::ymd(cdates),
    lubridate = lubridate::ymd(cdates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min       lq      mean    median        uq      max neval
#>   fasttime  533.670  538.575  577.8399  540.6035  544.7715 2105.967   100
#>    fastymd  817.212  822.011  891.7562  824.8310  828.7085 5767.230   100
#>        ymd 4236.551 4281.931 4395.9070 4313.1145 4336.5280 5953.068   100
#>  lubridate 5486.083 5599.050 6219.0921 5668.7810 7022.4130 8608.015   100
# comparison timings for fymd (numeric method)
ymd  <- get_ymd(dates)
(res_n <- microbenchmark(
    fastymd   = fymd(ymd[[1]], ymd[[2]], ymd[[3]]),
    lubridate = lubridate::make_date(ymd[[1]], ymd[[2]], ymd[[3]]),
    check     = "equal"
))
#> Unit: microseconds
#>       expr     min       lq     mean  median       uq      max neval
#>    fastymd 325.480 326.7925 356.3634 328.165 332.5630 1956.998   100
#>  lubridate 676.659 720.7870 839.9302 723.551 726.2465 2585.256   100
# comparison timings for year getter
(res_get_year <- microbenchmark(
    fastymd   = get_year(dates),
    ymd       = ymd::year(dates),
    lubridate = lubridate::year(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq      mean    median       uq       max neval
#>    fastymd  358.942  359.8645  399.9320  360.9220  364.338  3179.119   100
#>        ymd  381.245  394.3240  545.4191  403.5015  478.542  3748.817   100
#>  lubridate 7564.449 7583.4395 8110.1502 7602.2950 8880.130 10426.904   100
# comparison timings for month getter
(res_get_month <- microbenchmark(
    fastymd   = get_month(dates),
    ymd       = ymd::month(dates),
    lubridate = lubridate::month(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq      mean    median        uq       max neval
#>    fastymd  325.219  326.5470  328.9860  327.5240  330.8250   352.291   100
#>        ymd  417.141  425.0265  436.5834  433.5175  442.9205   712.876   100
#>  lubridate 8234.756 8273.7135 9062.5125 8289.4180 9659.8770 38708.926   100
# comparison timings for mday getter
(res_get_mday <- microbenchmark(
    fastymd   = get_mday(dates),
    ymd       = ymd::mday(dates),
    lubridate = lubridate::day(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq      mean    median        uq       max neval
#>    fastymd  361.417  362.0785  378.6033  364.0725  366.4865  1695.318   100
#>        ymd  421.540  426.5440  434.1789  429.9460  433.9390   790.311   100
#>  lubridate 7541.686 7561.8090 7842.8506 7578.0845 7617.9145 10427.656   100