CongressData
is a package designed to allow a user with
only basic knowledge of R interact with CongressData, a
dataset with nearly 800 variables that compiles information about all US
congressional districts across 1789-2023, and its codebook. The dataset
tracks district characteristics, members of congress, and the behavior
of those members in policymaking. Users can find variables related to
demographics, politics, and policy; subset the data across multiple
dimensions; create custom aggregations of the dataset; and access
citations in both plain text and BibTeX for every variable.
CongressData
is a functional package that interacts with
the CongressData dataset via the internet. Install the package from
GitHub like so:
# use the devtools library to download the package from GitHub
library(devtools)
# if there are issues or you only want to download CongressData
install_github("ippsr/CongressData")
get_var_info
: Retrieve information regarding variables
in CongressData and identify variables of interest with
get_var_info
. The function allows you to search to codebook
to find the years each variable is observed in the data; a short and
long description of each variable; and the source and citation/s for
each variable. Citations are available in both bibtex and plain text.
Use the function to search for broad terms like ‘tax’ with the
related_to
argument and/or partial-match variable names
with var_names
.
suppressMessages(library(dplyr))
library(CongressData)
#> Please cite:
#> Grossmann, M., Lucas, C., McCrain, J, & Ostrander, I. (2022). CongressData.
#> East Lansing, MI: Institute for Public Policy and Social Research (IPPSR).
#>
#> Run `CongressData::get_congress_version()` to print the version of CongressData the package is using.
# variables related to health insurance
<- get_var_info(related_to = "health insurance")
h_ins_cong
cat("There are",nrow(h_ins_cong),"variables related to health insurance in CongressData")
#> There are 41 variables related to health insurance in CongressData
head(h_ins_cong$variable)
#> [1] "percent_under18_healthins" "percent_private_under18"
#> [3] "percent_public_under18" "percent_privpub_under18"
#> [5] "percent_pop18_34" "percent_private_18_34"
# variables with 'under18' in their name
<- get_var_info(var_names = "under18")
under18_cong
head(under18_cong$variable)
#> [1] "percent_under18" "percent_under18_healthins"
#> [3] "percent_private_under18" "percent_public_under18"
#> [5] "percent_privpub_under18" "under18"
get_var_info
returns the following information to
simplify using CongressData:
get_cong_data
: Access all or a part of CongressData with
get_cong_data
. Subset by state names with
state
and years with years
(either a single
year or a two-year vector that represents the min/max of what you want).
You can also use the related_to
argument to search across
variable names, short/long descriptions from the codebook, and citations
for non-exact matches of a supplied term. For example, searching ‘tax’
will return variables with words like ‘taxes’ and ‘taxable’ in any of
those columns.
# load the entire dataset
<- get_cong_data()
all_the_dat
# subset by state, topic, and years
<- get_cong_data(states = c("Indiana","Kentucky","Michigan")
cong_subset related_to = "tax"
,years = c(1960,1980)) ,
Run get_congress_version
to see what version of the
dataset is available in CongressData
.
::get_congress_version()
CongressData#> You are using CongressData version: 1.1
get_var_info
: Each variable in CongressData was
collected from external sources, please use get_var_info
to
obtain their citations (plain text and BibTeX). We’ve made it easy to
cite the source of each variable you use with the
get_var_info
function described above. Supply a vector of
variable names to the function with the var_names
function
and collect the citations provided in the plain text or BibTeX columns.
NOTE: Some variables have multiple citations, so do check you have them
all.
# bibtex is also available
get_var_info(var_names = "com_benghazi_299") %>%
pull(plaintext_cite)
#> [1] "Charles Stewart III and Jonathan Woon. Congressional Committee Assignments, 103rd to 114th Congresses, 1993--2017: House of Representatives, 2017.\n"
# bibtex is also available
get_var_info(var_names = "percent_bus") %>%
pull(plaintext_cite)
#> [1] "U.S. Census Bureau. (2022). 2009-2019 American Community Survey 1-year Estimates. Retrieved from the Census Bureau Data API."
In addition to citing each variable’s source, we ask that you cite CongressData if use this package or the dataset. A recommended citation is below.
Grossmann, M., Lucas, C., McCrain, J, & Ostrander, I. (2022). CongressData. East Lansing, MI: Institute for Public Policy and Social Research (IPPSR)
For questions about the CongressData dataset, contact Ben Yoel (yoelbenj@msu.edu).