The IPEDS
package contains data on Post Secondary
Institution Statistics in 2020. Some datasets have been filtered to
exclude imputation variables, while other datasets are included in full.
Details are given below.
We wanted to create a package that can be used with just a basic R
understanding, for prospective students wanting to attend undergraduate
or graduate colleges and universities. The package IPEDS
allows easy access to a wide variety of information regarding
Postsecondary Institutions, its current students, faculty, and their
demographics, financial aid, educational and recreational offerings, and
completions. College search websites are sometimes a little vague in
it’s statistics for an institution; this package aims to provide a
closer idea of what their institution of interest is really like.
All the datasets are taken from [IPEDS] (https://nces.ed.gov/ipeds/use-the-data)
adm2020
: dataset of Admissions and Test Scores for Fall
2020admin2020
: dataset of Administration for 2020complete2020
: dataset of Completions in 2020conference
: dataset of Conferences for sports (from
offerings2020
)dir_info2020
: dataset of Directory Information for
2020fall_enroll2020
: dataset of Fall Enrollment for
2020fin_aid1920
: dataset of Financial Aid Statistics for
2019-2020offerings2020
: dataset of Institutional offerings for
2020relig_aff
: dataset of Religious Affiliations (from
offerings2020
)staff_cat
: dataset of Staff Categories based on
admin2020$staff_cat
This package can be used by students, college counselors, or involved parents interested in pursuing higher education, considering their options, and securing admission into their school of choice. Additionally, anyone interested in educational statistics can use this data for their research.
Here’s the first 5 rows of the complete2020
dataset
head(complete2020)
#> INSTITUTION_ID AWARD_LVL TOTAL TOTAL_M TOTAL_W TOTAL_NATIVE TOTAL_ASIAN TOTAL_BLACK TOTAL_HISP TOTAL_NHPI TOTAL_WHITE TOTAL_MULT TOTAL_UNKNOWN TOTAL_NRA UND18 AGE18_24 AGE25_39 AGE40PLUS AGE_UNKNOWN
#> 1 100654 5 585 210 375 0 3 524 6 2 12 5 25 8 0 473 106 6 0
#> 2 100654 7 300 78 222 0 1 226 3 0 27 2 31 10 0 49 218 33 0
#> 3 100654 9 11 6 5 0 1 7 0 0 2 0 0 1 0 0 4 7 0
#> 4 100663 2 74 31 43 1 4 4 2 0 60 0 0 3 0 36 28 10 0
#> 5 100663 5 2639 958 1681 9 189 582 127 0 1564 106 12 50 0 1837 653 149 0
#> 6 100663 7 2314 780 1534 1 92 371 60 3 1461 42 55 229 0 370 1565 379 0
We can use this package to address many questions such as:
To answer our questions we can make use of the existing functionality the package provides, as well as data wrangling and data visualization techniques. Some examples that address the question are below:
Which institutions have the qualities I’d like in an institution?
Let’s say Sophia, a senior at high school, is interested in going to a private college of relatively small size in the New England area that will accept the AP credits she’s earned, but is also slightly diverse and helps it’s students afford college.
Using the school_preferences
function, Sophia can find a
school that perfectly fits her preferences.
school_preferences(size = 2, region = "New England", alt_credits = "Yes", diversity_students = 36, financial_aid = 70, affiliation = 3)
#> Institution Institution ID % of Students Recieved Aid Institution Size Student Diversity Staff Diversity % of Students Disabled Region Type of Institution Religious Affiliation Calendar System Open Admissions Policy Years Required For Entering Vet Programs Alternative Credit Alternative Tuition Payment Distance Education Counseling Services Employment Services Daycare Services Live On-Campus Room Price Board Price Undergraduate Application Fee Graduate Application Fee
#> 1 University of Bridgeport 128744 78 2 67 20 1 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 . . 0 0
#> 2 Goodwin University 129154 88 2 50 24 1 New England 3 -2 1 1 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 4500 1700 50 50
#> 3 American International College 164447 97 2 53 14 1 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 7420 7240 0 50
#> 4 Bay Path University 164632 81 2 41 9 1 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 . . 25 0
#> 5 Clark University 165334 91 2 42 31 2 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 5800 4000 0 75
#> 6 Lesley University 166452 85 2 37 19 2 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 10060 6570 0 50
#> 7 Mount Holyoke College 166939 76 2 55 37 2 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 7860 8160 60 50
#> 8 Smith College 167835 71 2 51 27 1 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers no distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 9400 9360 0 60
#> 9 Wentworth Institute of Technology 168227 84 2 36 26 2 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 11766 3200 50 50
#> 10 New England Institute of Technology 217305 79 2 63 2 1 New England 3 -2 2 2 2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 8730 5610 25 25
The output is a data frame that includes The Institution name, ID, the % of students that receive aid, the size of the institution, the percent of non-white students and staff, the % of disabled students, the region of the institution, type, and other relevant information about the institution.
We can select the columns Sophia is most interested in:
school_preferences(size = 2, region = "New England", alt_credits = "Yes", diversity_students = 36, financial_aid = 70, affiliation = 3) %>%
select(`Institution`, `Institution Size`, `Region`, `Alternative Credit`, `Student Diversity`, `% of Students Recieved Aid`, `Type of Institution`)
#> Institution Institution Size Region Alternative Credit Student Diversity % of Students Recieved Aid Type of Institution
#> 1 University of Bridgeport 2 New England Takes alternate credit 67 78 3
#> 2 Goodwin University 2 New England Takes alternate credit 50 88 3
#> 3 American International College 2 New England Takes alternate credit 53 97 3
#> 4 Bay Path University 2 New England Takes alternate credit 41 81 3
#> 5 Clark University 2 New England Takes alternate credit 42 91 3
#> 6 Lesley University 2 New England Takes alternate credit 37 85 3
#> 7 Mount Holyoke College 2 New England Takes alternate credit 55 76 3
#> 8 Smith College 2 New England Takes alternate credit 51 71 3
#> 9 Wentworth Institute of Technology 2 New England Takes alternate credit 36 84 3
#> 10 New England Institute of Technology 2 New England Takes alternate credit 63 79 3
What are the admission requirements for my preferred institution?
If Sophia is interested in what it takes to apply to one of her
preferred schools, Sophia can use the admission_reqs
function that provides her with a list of the application
requirements.
admission_reqs(167835)
#> # A tibble: 9 × 2
#> Requirements Priority
#> <chr> <chr>
#> 1 High School Record Required
#> 2 Completion of College-Prepatory Program Required
#> 3 Recommendations Required
#> 4 High School GPA Recommended
#> 5 High School Rank Recommended
#> 6 Test of English as a Foreign Language Recommended
#> 7 Formal Demonstration of Competencies Neither_required_nor_recommended
#> 8 Admission Test Scores Neither_required_nor_recommended
#> 9 Other Tests Neither_required_nor_recommended
Now Sophia knows which application materials are required and recommended, and which ones are not necessary at all.
What’s the relationship between the diversity of students and the diversity of staff?
In another scenario, a educational statistician is interested in the
potential relationship between how diverse a student body is and the
diversity of their staff. We’ll data visualize the % of diversity from
the resulting dataframe output by the school_preferences
function.
<- school_preferences()
data
ggplot(data, aes(x = `Staff Diversity`, y = `Student Diversity`)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Student Diversity vs. Staff Diversity",
y = "Student Diversity (%)",
x = "Staff Diversity (%)")
#> `geom_smooth()` using formula 'y ~ x'
Due to it’s functionality, the statistician could also limit their research to explore this relationship to schools only located in the New England area:
<- school_preferences(region = "New England")
data
ggplot(data, aes(x = `Staff Diversity`, y = `Student Diversity`)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Student Diversity vs. Staff Diversity in New England Institutions",
y = "Student Diversity (%)",
x = "Staff Diversity (%)")
#> `geom_smooth()` using formula 'y ~ x'
In both cases, we can see a moderate to strong positive relationship between student and staff diversity; after noting this relationship the statistician could go further by observing the how the size of an institution, can possibly influence this relationship.
<- school_preferences(region = "New England") %>%
data filter(`Institution Size` != -1 &`Institution Size` != -2 )
$`Institution Size` <- as.factor(data$`Institution Size`)
data
ggplot(data, aes(x = `Staff Diversity`, y = `Student Diversity`, color = `Institution Size`)) +
geom_point() +
scale_fill_viridis_c(option = "magma") +
geom_smooth(method = "lm", aes(color=`Institution Size`), se = FALSE) +
labs(title = "Student Diversity vs. Staff Diversity in New England Institutions by Size",
y = "Student Diversity (%)",
x = "Staff Diversity (%)")
#> `geom_smooth()` using formula 'y ~ x'
And they can conclude here doesn’t seem to be much of a difference depending on Institution Size in New England Institutions.
What are the main similarities and differences between my two top college choices?
Amanda, a high school senior, has to decide where she will attend college soon, but is still debating between her top two choices.
Using the compare_int
function, Amanda can take the two
schools she is interested in and compare them side by side in a table
that lists some of the major qualities of each institution.
compare_int(100654, 100663)
#> Alabama A & M University University of Alabama at Birmingham
#> Size 3 5
#> Full Time Students 1622 2102
#> Part Time Students 42 52
#> Average Aid Awarded 9872 9344
#> Average Award Size 9679 10435
#> City Normal Birmingham
#> State AL AL
#> Region Southeast Southeast
#> Urbanization 12 12
#> Calendar System 1 1
#> Admission Test Scores Required Required
#> Room & Board Cost . .
#> Degrees Offered Yes Yes
#> AP Credit Accepted Yes Yes
#> Dual Enrollment Credit Accepted Yes Yes
#> Study Abroad Programs Yes Yes
#> Freshman Required to Live on Campus No No
#> Meals per Week 19 .