Weighted descriptive statistics are required in some contexts, for
instance, in the analysis of survey data. This can be accomplished by
using the provided weighted wrapper class. Internally, this
will cause the functions wtd.mean, wtd.var,
wtd.quantile and wtd.table from the
Hmisc package (which is optional in general but required to
use this functionality) to be used in place of their standard
non-weighted counterparts.
We take an example from the survey package (note that
this is for illustration purposes only, it is not meant to be a real
application):
## 
## Attaching package: 'survival'## The following object is masked from 'package:boot':
## 
##     aml## 
## Attaching package: 'survey'## The following object is masked from 'package:graphics':
## 
##     dotchartdata(myco)
myco$Leprosy <- factor(myco$leprosy, levels=1:0, labels=c("Leprosy Cases", "Controls"))
myco$AgeCat <- factor(myco$Age,
    levels=c(7.5,      12.5,       17.5,       22.5,       27.5,       32.5      ),
    labels=c("5 to 9", "10 to 14", "15 to 19", "20 to 24", "25 to 29", "30 to 34")
)
myco$ScarL <- as.logical(myco$Scar)
label(myco$Age) <- "Age"
units(myco$Age) <- "years"
label(myco$AgeCat) <- "Age Group"
label(myco$ScarL) <- "BCG vaccination scar"
table1(~ ScarL + Age + AgeCat | Leprosy, data=weighted(myco, wt), big.mark=",")| Leprosy Cases (N=258) | Controls (N=61,310) | Overall (N=61,568) | |
|---|---|---|---|
| BCG vaccination scar | |||
| Yes | 100 (38.8%) | 34,847 (56.8%) | 34,947 (56.8%) | 
| No | 158 (61.2%) | 26,463 (43.2%) | 26,621 (43.2%) | 
| Age (years) | |||
| Mean (SD) | 21.2 (8.32) | 16.8 (8.36) | 16.8 (8.37) | 
| Median [Min, Max] | 22.5 [7.50, 32.5] | 17.5 [7.50, 32.5] | 17.5 [7.50, 32.5] | 
| Age Group | |||
| 5 to 9 | 25 (9.7%) | 17,327 (28.3%) | 17,352 (28.2%) | 
| 10 to 14 | 50 (19.4%) | 13,172 (21.5%) | 13,222 (21.5%) | 
| 15 to 19 | 44 (17.1%) | 10,325 (16.8%) | 10,369 (16.8%) | 
| 20 to 24 | 39 (15.1%) | 8,026 (13.1%) | 8,065 (13.1%) | 
| 25 to 29 | 47 (18.2%) | 5,981 (9.8%) | 6,028 (9.8%) | 
| 30 to 34 | 53 (20.5%) | 6,479 (10.6%) | 6,532 (10.6%) | 
It also works in “transpose” mode:
| Age (years) | BCG vaccination scar | |
|---|---|---|
| Leprosy Cases (N=258) | Mean (SD): 21.2 (8.32) Median [Min, Max]: 22.5 [7.50, 32.5] | Yes: 100 (38.8%) No: 158 (61.2%) | 
| Controls (N=61,310) | Mean (SD): 16.8 (8.36) Median [Min, Max]: 17.5 [7.50, 32.5] | Yes: 34,847 (56.8%) No: 26,463 (43.2%) | 
| Overall (N=61,568) | Mean (SD): 16.8 (8.37) Median [Min, Max]: 17.5 [7.50, 32.5] | Yes: 34,947 (56.8%) No: 26,621 (43.2%) | 
For more flexibility, we may not want the weighting to be applied
globally, but only to some of the variables. We can do this as well, by
using weighted on individual variables:
| Leprosy Cases (N=258) | Controls (N=258) | Overall (N=516) | |
|---|---|---|---|
| BCG vaccination scar | |||
| Yes | 100 (38.8%) | 34,847 (56.8%) | 34,947 (56.8%) | 
| No | 158 (61.2%) | 26,463 (43.2%) | 26,621 (43.2%) | 
| Age (years) | |||
| Mean (SD) | 21.2 (8.32) | 21.2 (8.32) | 21.2 (8.31) | 
| Median [Min, Max] | 22.5 [7.50, 32.5] | 22.5 [7.50, 32.5] | 22.5 [7.50, 32.5] | 
| Age Group | |||
| 5 to 9 | 25 (9.7%) | 25 (9.7%) | 50 (9.7%) | 
| 10 to 14 | 50 (19.4%) | 50 (19.4%) | 100 (19.4%) | 
| 15 to 19 | 44 (17.1%) | 44 (17.1%) | 88 (17.1%) | 
| 20 to 24 | 39 (15.1%) | 39 (15.1%) | 78 (15.1%) | 
| 25 to 29 | 47 (18.2%) | 47 (18.2%) | 94 (18.2%) | 
| 30 to 34 | 53 (20.5%) | 53 (20.5%) | 106 (20.5%) | 
This implementation allows for simple weighted statistics, but does
not currently support more complex designs from the survey
package like stratified sampling or cluster sampling.
weighted and indexed classesThe weighted class is just a wrapper around a vector or
data.frame that adds a vector of weights as an attribute.
These weights are carried along or subsetted appropriately during
operations like slicing or subsetting. See ?weighted for
some examples.
The indexed class is similar, but it simply maintains
the indices of a vector (row indices for a data.frame) when
a subset or slide is taken. This leads to some interesting possibilities
when we want to do more complex things.
The following example also comes from the survey
package:
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
svyby(~api99+api00, ~stype, dclus1, svymean)##   stype    api99    api00 se.api99 se.api00
## E     E 607.7917 648.8681 22.81660 22.36241
## H     H 595.7143 618.5714 41.76400 38.02025
## M     M 608.6000 631.4400 32.56064 31.60947##         stype
## sch.wide         E         H         M
##      No   406.1640  101.5410  270.7760
##      Yes 4467.8035  372.3170  575.3989Using table1, the same results can be presented more
beautifully:
myrender <- function(x, name, ...) {
    if (is.numeric(x)) {
        r <- svymean(as.formula(paste0("~", name)), subset(dclus1, (1:nrow(dclus1)) %in% indices(x)))
        r <- c(Mean=as.numeric(r), SE=sqrt(attr(r, "var", exact=T)))
        r <- unlist(stats.apply.rounding(as.list(r), big.mark=","))
    } else {
        r <- svytable(as.formula(paste0("~", name)), subset(dclus1, (1:nrow(dclus1)) %in% indices(x)))
        r <- unlist(stats.apply.rounding(as.list(r), big.mark=",", digits=1, rounding.fn=round_pad))
    }
    c("", r)
}
apiclus1$stype2 <- factor(apiclus1$stype, levels=c("E", "M", "H"),
    labels=c("Elementary", "Middle School", "High School"))
label(apiclus1$api99)    <- "API in 1999"
label(apiclus1$api00)    <- "API in 2000"
label(apiclus1$sch.wide) <- "Met school-wide growth target?"
table1(~ api99 + api00 + sch.wide | stype2, indexed(apiclus1), render=myrender,
    render.strat=names)| Elementary | Middle School | High School | Overall | |
|---|---|---|---|---|
| API in 1999 | ||||
| Mean | 608 | 609 | 596 | 607 | 
| SE | 22.8 | 32.6 | 41.8 | 24.2 | 
| API in 2000 | ||||
| Mean | 649 | 631 | 619 | 644 | 
| SE | 22.4 | 31.6 | 38.0 | 23.5 | 
| Met school-wide growth target? | ||||
| No | 406.2 | 270.8 | 101.5 | 778.5 | 
| Yes | 4,467.8 | 575.4 | 372.3 | 5,415.5 |