Since version 0.2 cranly includes functions for
constructing and working with package dependence tree objects.
Specifically, the packages that are requirements for a specified package
(i.e. appear in Depends
, Imports
or
LinkingTo
) are found, then the requirements for those
packages are found, and so on. In essence, a package’s dependence tree
shows what else needs to be installed with the package in an empty
package library with the package, and hence it can be used to + remove
unnecessary dependencies that “drag” with them all sorts of other
packages + identify packages that are heavy for the CRAN mirrors +
produced some neat visuals for the package
cranly_dependence_tree
objectsConstructing cranly_dependence_tree
objects is
straightforward once a package directives network has been derived.
Let’s attach cranly
library("cranly")
and use an instance of the package directives network
<- readRDS(url("https://raw.githubusercontent.com/ikosmidis/cranly/develop/inst/extdata/package_network.rds")) package_network
from CRAN’s state on 2022-08-26 14:43:43 BST.
Alternatively, today’s package directives network can be constructed by doing
<- clean_CRAN_db()
cran_db <- build_network(cran_db) package_network
We can compute dependence trees for any package in CRAN using the
function compute_dependence_tree
on the package directives
network. For example the dependence tree of brglm2
is
compute_dependence_tree(package_network, "brglm2")
#> package generation
#> 1 brglm2 0
#> 2 MASS -1
#> 3 stats -1
#> 4 Matrix -1
#> 5 graphics -1
#> 6 nnet -1
#> 7 enrichwith -1
#> 8 numDeriv -1
#> 9 methods -2
#> 10 graphics -2
#> 11 grid -2
#> 12 stats -2
#> 13 utils -2
#> 14 lattice -2
#> 15 grDevices -2
and of tibble is
compute_dependence_tree(package_network, "tibble")
#> package generation
#> 1 tibble 0
#> 2 fansi -1
#> 3 lifecycle -1
#> 4 magrittr -1
#> 5 methods -1
#> 6 pillar -1
#> 7 pkgconfig -1
#> 8 rlang -1
#> 9 utils -1
#> 10 vctrs -1
#> 11 grDevices -2
#> 12 utils -2
#> 13 glue -2
#> 14 rlang -2
#> 15 cli -2
#> 16 fansi -2
#> 17 lifecycle -2
#> 18 utf8 -2
#> 19 vctrs -2
#> 20 glue -3
#> 21 utils -3
#> 22 grDevices -3
#> 23 methods -3
#> 24 rlang -3
#> 25 cli -3
The resulting data frame, includes package names and a generation
index. The generation of the named package is by default 0 and as we
move back through the required packages and the requirements of those
the generation index decreases by 1. I had loads of fun
implementing compute_dependence_tree
, because the tree
construction can be neatly and cleanly written as a recursion (see
source code of compute_dependence_tree
), leveraging the
advantages of functional programming (that’s a different and long
discussion, though).
The method build_dependence_tree
uses
compute_dependence_tree
to construct and edge list for the
dependence tree, that we can the visualize. For example for tibble
<- build_dependence_tree(package_network, "tibble")
tibble_tree plot(tibble_tree)
The package dependence index is a rough measure of how much “baggage” an R package carries. The package dependence index is defined as the weighted average that averages across the generation index of the packages in the tree, with weights that are inversely proportional to the popularity of each package in terms of how many other packages depend on, link to or import it. Mathematically, the package dependence index is defined as \[ -\frac{\sum_{i \in C_p; i \ne p} \frac{1}{N_i} g_i}{\sum_{i \in C_p; i \ne p} \frac{1}{N_i}} \] where \(C_p\) is the dependence tree for the package(s) \(p\), \(N_i\) is the total number of packages that depend, link or import package \(i\), and \(g_i\) is the generation that package \(i\) appears in the dependence tree of package(s) \(p\). The generation takes values on the non-positive integers, with the package(s) \(p\) being placed at generation \(0\), the packages that \(p\) links to, depends or imports at generation \(-1\) and so on.
For example, the package dependence index for all packages in the dependence tree of betareg
<- build_dependence_tree(package_network, "betareg")
betareg_tree <- sapply(betareg_tree$nodes$package, function(package) {
betareg_dep_index <- build_dependence_tree(package_network, package = package)
tree <- summary(tree)
s $dependence_index
s
})sort(betareg_dep_index)
#> flexmix Formula lattice modeltools nnet zoo betareg
#> 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.06768005
#> lmtest sandwich
#> 0.50726552 0.50726552