First we need to import the package:
library(HistDat)
Now, let’s say that we have seen the number one 1000 billion times, the number two twice that, the number three the same as two, and the number four the same number of times as one.
If we turned this into a single vector with the true number of each observation, we would have a vector of length 6,000,000,000,000! It seems unlikely that this would even fit into RAM, and if it did, calculations would be very difficult.
= HistDat(
h vals = 1:4,
counts = c(1e12, 2e12, 2e12, 1e12)
)
Now let’s calculate some summary statistics, without using RAM we don’t need!
mean(h)
#> [1] 2.5
min(h)
#> [1] 1
length(h)
#> [1] 6e+12
median(h)
#> [1] 2.5
We actually can convert a hist_dat
object into a 1-D vector, which is reasonable if we only have a small number of counts:
= HistDat(
h vals = 1:4,
counts = c(1, 2, 2, 1)
)as.vector(h)
#> [1] 1 2 2 3 3 4