Within light there is darkness, but do not try to understand that darkness. Within darkness there is light, but do not look for that light. Light and darkness are a pair like the foot before and the foot behind in walking.
– From the Zen teaching poem Sandokai.
The goal of halfmoon is to cultivate balance in propensity score models.
You can install the most recent version of halfmoon from CRAN with:
install.packages("halfmoon")
You can also install the development version of halfmoon from GitHub with:
# install.packages("devtools")
::install_github("malcolmbarrett/halfmoon") devtools
halfmoon includes several techniques for assessing the balance created by propensity score weights.
library(halfmoon)
library(ggplot2)
# weighted mirrored histograms
ggplot(nhefs_weights, aes(.fitted)) +
geom_mirror_histogram(
aes(group = qsmk),
bins = 50
+
) geom_mirror_histogram(
aes(fill = qsmk, weight = w_ate),
bins = 50,
alpha = 0.5
+ scale_y_continuous(labels = abs) )
# weighted ecdf
ggplot(
nhefs_weights,aes(x = smokeyrs, color = qsmk)
+
) geom_ecdf(aes(weights = w_ato)) +
xlab("Smoking Years") +
ylab("Proportion <= x")
# weighted SMDs
<- tidy_smd(
plot_df
nhefs_weights,:active,
race.group = qsmk,
.wts = starts_with("w_")
)
ggplot(
plot_df,aes(
x = abs(smd),
y = variable,
group = method,
color = method
)+
) geom_love()
halfmoon also has support for working with matched datasets. Consider these two objects from the MatchIt documentation:
library(MatchIt)
# Default: 1:1 NN PS matching w/o replacement
<- matchit(treat ~ age + educ + race + nodegree +
m.out1 + re74 + re75, data = lalonde)
married
# 1:1 NN Mahalanobis distance matching w/ replacement and
# exact matching on married and race
<- matchit(treat ~ age + educ + race + nodegree +
m.out2 + re74 + re75, data = lalonde,
married distance = "mahalanobis", replace = TRUE,
exact = ~ married + race)
One option is to just look at the matched dataset with halfmoon:
<- get_matches(m.out1)
matched_data
<- tidy_smd(
match_smd
matched_data,c(age, educ, race, nodegree, married, re74, re75),
.group = treat
)
love_plot(match_smd)
The downside here is that you can’t compare multiple matching
strategies to the observed dataset; the label on the plot is also wrong.
halfmoon comes with a helper function, bind_matches()
, that
creates a dataset more appropriate for this task:
<- bind_matches(lalonde, m.out1, m.out2)
matches head(matches)
#> treat age educ race married nodegree re74 re75 re78 m.out1 m.out2
#> NSW1 1 37 11 black 1 1 0 0 9930.0460 1 1
#> NSW2 1 22 9 hispan 0 1 0 0 3595.8940 1 1
#> NSW3 1 30 12 black 0 0 0 0 24909.4500 1 1
#> NSW4 1 27 11 black 0 1 0 0 7506.1460 1 1
#> NSW5 1 33 8 black 0 1 0 0 289.7899 1 1
#> NSW6 1 22 9 black 0 1 0 0 4056.4940 1 1
matches
includes an binary variable for each
matchit
object which indicates if the row was included in
the match or not. Since downweighting to 0 is equivalent to filtering
the datasets to the matches, we can more easily compare multiple matched
datasets with .wts
:
<- tidy_smd(
many_matched_smds
matches,c(age, educ, race, nodegree, married, re74, re75),
.group = treat,
.wts = c(m.out1, m.out2)
)
love_plot(many_matched_smds)
We can also extend the idea that matching indicators are weights to weighted mirrored histograms, giving us a good idea of the range of propensity scores that are being removed from the dataset.
# use the distance as the propensity score
$ps <- m.out1$distance
matches
ggplot(matches, aes(ps)) +
geom_mirror_histogram(
aes(group = factor(treat)),
bins = 50
+
) geom_mirror_histogram(
aes(fill = factor(treat), weight = m.out1),
bins = 50,
alpha = 0.5
+ scale_y_continuous(labels = abs) )