Quick and Flexible Survey Weighting
Ben Mainwaring
R package for quickly and flexibly calculating rake weights (also known as rim weights, or iterative proportional fitting). This allows post-stratification/non-response weighting on multiple variables, even the interlocked distribution of the two variables is not known. Interacts with Thomas Lumley’s survey package, and adds additional functionality, more adaptable syntax, and error-checking to the weighting functionality in survey.
The core function in svyweight is rakesvy
(and the
related rakew8
), which calculates post-stratification
weights for a dataset or svydesign object, given targets. The command is
designed to make weighting as simple as possible, with the following
features: - Imputing unknown (NA) targets based on observed
distributions - Accepting targets of 0 (equivalent to dropping cases
from analysis) - Assessing weight quality using Kish’ effective sample
size - Weighting to either counts or percentage targets - Allowing
specification of targets as vectors, matrices, or data frames - Allowing
targets to be quickly rebased to a specified sample size - Flexibly
matching targets to the correct variables in a dataset - Dynamically
specifying weight targets based on recodes of variables in observed
data
More details about the package are available in the R help files (see
package?svyweight
in R).
The package is under development, and additional features are planned for future release. This includes: - Additional metrics of weight quality - Techniques for weighting numeric and ordinal data based on histograms/binning
Contributions to the package, or suggestions for additional features, are gratefully accepted via email or GitHub.