Here is collection of tips and tricks to go further with desctable
You can define labels for variables using the .labels
argument in desc_table
<- c(mpg = "Miles/(US) gallon",
labels cyl = "Number of cylinders",
disp = "Displacement (cu.in.)",
hp = "Gross horsepower",
drat = "Rear axle ratio",
wt = "Weight (1000 lbs)",
qsec = "1/4 mile time",
vs = "Engine",
am = "Transmission",
gear = "Number of forward gears",
CARBURATOR = "Number of carburetors")
%>%
mtcars desc_table(.labels = labels) %>%
desc_output("DT")
As you can see with CARBURATOR
instead of
carb
, not all variables need to have a label, and unused
labels are discarded.
desc_table
chooses its own statistics this way:
N = length
"%" = percent
if there is at least a factormin
, max
, Q1
,
Q3
, median
, mean
,
sd
, IQR
if there is at least a numericYou can define your own automatic statistic function using the
.auto
argument in desc_table
.
This function should accept one argument, the table to choose statistics
for (in the case of a grouped dataframe the subtables will be passed to
the function). It should return a list of statistics.
Here is the code of stats_auto
, the default value of
.auto
<- function(data) {
stats_auto %>%
data lapply(is.numeric) %>%
unlist() %>%
-> numeric
any
%>%
data lapply(is.factor) %>%
unlist() %>%
any() -> fact
<- list("Min" = min,
stats "Q1" = ~quantile(., .25),
"Med" = stats::median,
"Mean" = mean,
"Q3" = ~quantile(., .75),
"Max" = max,
"sd" = stats::sd,
"IQR" = IQR)
if (fact & numeric)
c(list("N" = length,
"%" = percent),
stats)else if (fact & !numeric)
list("N" = length,
"%" = percent)
else if (!fact & numeric)
stats }
If you often reuse the same statistics for multiple tables and you
don’t want to repeat yourself, you can splice a list to
desc_table
using the rlang::!!!
operator
= list(N = length,
stats Mean = mean,
SD = sd)
%>%
mtcars desc_table(!!!stats) %>%
desc_output("DT")
When splicing, all stats need to be explicitly named
= list(N = length,
stats2
mean,
sd)
%>%
mtcars desc_table(!!!stats2) %>%
desc_output("DT")
You can also define a “dumb” automatic function
<- function(data)
default_stats
{list(N = length,
mean,
sd) }
desc_table
chooses its own statistical tests this
way:
fisher.test
fisher.test
fails, fallback on
chisq.test
wilcoxon.test
if there are two groupskruskal.test
if there are more than two groupsYou can define your own automatic statistic function using the
.auto
argument in desc_tests
.
This function should accept two arguments, the variable to compare and
the grouping variable, and return a statistical test that accepts a
formula
argument and returns an object with a
p.value
element.
Here is the code of tests_auto
, the default value of
.auto
<- function(var, grp) {
tests_auto <- factor(grp)
grp
if (nlevels(grp) < 2)
~no.test
else if (is.factor(var)) {
if (tryCatch(is.numeric(fisher.test(var ~ grp)$p.value), error = function(e) F))
~fisher.test
else
~chisq.test
else if (nlevels(grp) == 2)
} ~wilcox.test
else
~kruskal.test
}
You can also provide a default statistical test using the
.default
argument
%>%
mtcars group_by(am) %>%
desc_table(mean, sd) %>%
desc_tests(.default = ~t.test) %>%
desc_output("DT")
Note that as with named tests, it is necessary to prepend the test
name with a tilde (~
).
You can still choose individual tests when you define either a
.auto
or a .default
test
%>%
mtcars group_by(am) %>%
desc_table(mean, sd, median, IQR) %>%
desc_tests(.default = ~t.test, carb = ~wilcox.test) %>%
desc_output("DT")
Note that if a .default
test is provided,
.auto
is ignored.
You can set the number of significant digits to display with the
digits
argument. The p values are truncated at
1E-digits.
%>%
iris group_by(Species) %>%
desc_table(mean, sd) %>%
desc_tests() %>%
desc_output("DT", digits = 10)
Any additional argument given to desc_output
will be
carried to the output function
%>%
iris group_by(Species) %>%
desc_table(mean, sd) %>%
desc_output("DT", filter = "top")