library(freqtables)
library(dplyr)
data(mtcars)
This vignette is about testing differences in frequencies/proportions, which is a common task in my data analysis workflow. Additionally, it’s important to me that these tools work well in a dplyr pipeline.
Because we want different behavior for different classes of inputs, freq_test is an S3 generic function with multiple methods.
The name freq_test is easy to reason about, sounds similar to freq_table, which must be the input to freq_test, and is general enough to contain chi-square, fisher, etc.
The flowchart below illustrates the scenarios, and related hypothesis tests, discussed in this vignette.
In some cases, we collect data from a single sample of people and measure the proportion of those people who can be classified (or categorized) according to some characteristic of interest. For example, suppose we collected data from a group of students and wanted to know what proportion of those students could be classified as males and females.
In the following code, we generate a small sample of students (n = 20). The student’s gender is drawn at random from a binomial distribution where being male or female are equally likely. Said another way, there is an equal number of male and female students on our hypothetical campus, and we have sampled a random subset of 20 of them without bias. In this context, “without bias” means that every student on our campus was equally likely to turn up in our sample.
set.seed(123)
<- tibble(
students male = rbinom(20, 1, 0.5)
)
Next, we can calculate the frequency with which each gender appears in our sample of 20 students.
%>%
students freq_table(male)
#> # A tibble: 2 × 9
#> var cat n n_total percent se t_crit lcl ucl
#> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 male 0 9 20 45 11.4 2.09 23.8 68.2
#> 2 male 1 11 20 55 11.4 2.09 31.8 76.2
The frequencies returned to us by freq_table
tell us that our sample includes 9 females (which is 45% of all students in the sample) and 11 males (which is 55% of all students sampled). But, what about the precision of those estimates?
If we are only interested in the proportion of students in our sample that are male and female, then we’re done. We have our answer. Typically, though, our interest extends beyond our little sample. What we really want to know is if the proportion of males and females differ among all students on our hypothetical campus (our superpopulation).
I like the following quote from Daniel Kaplan that discusses sampling variability and its relationship to precision of our estimate:
Less obvious to those starting out in statistics is the idea that a quantity such as the mean or median itself has limited precision. This isn’t a matter of the calculation itself. If the calculation has been done correctly, the resulting quantity is exactly right. Instead the limited precision arises from the sampling process that was used to collect the data. The exact mean or median relates to the particular sample you have collected. Had you collected a different sample, it’s likely that you would have gotten a somewhat different value for the quantities you calculate, the so-called sampling variability. One purpose of statistical inference is to characterize sampling variability, that is, to provide a quantitative meaning for “somewhat different value.”
Kaplan, Daniel. Statistical Modeling: A Fresh Approach (Project MOSAIC Books) (Kindle Locations 844-850). Project MOSAIC Books. Kindle Edition.
Further:
Even if all the individuals in a population were included in a study, the study subjects are viewed as a sample of the potential biologic experience of an even broader conceptual population.
Rothman, Kenneth J. Modern Epidemiology (p. 149). Lippincot (Wolters Kluwer Health). Kindle Edition.
In the case of our estimate above, we are characterizing the precision of our estimates in the form of 95% confidence intervals. Here, “precision” is a rough estimate of our uncertainty around the parameter we are interested in (i.e., the population proportion) based in the information we have available to us (Rothman, 2008). Perhaps it should be called an uncertainty interval or precision interval, rather than a confidence interval. The width of this measure of uncertainty is made up of a combination of sampling variability, the arbitrarily selected value for alpha (i.e., 0.05), the sample size, bias, and the underlying correctness of the model being applied to the data (Rothman, 2008; Kaplan, 2017).
So, for the above sample, we simply state that the percentage of female students was 45.00% (95% confidence interval 23.76% to 68.23%) and the percentage of male students was 55.00% (31.77% to 76.24%).
In the analysis above, we found that 45% of our sample was female and 55% of our sample was male. But again, we want to make reasonable estimates of whether or not the proportion of males and females differ among all students on our hypothetical campus. One common way to do so is with a null value hypothesis test. Specifically, in this example, we want to test the null hypothesis that the percentage of males and females among all students on our hypothetical campus are equal.
We can do so using methods based on the sample proportion of cases. They assume the normal approximation to the binomial distribution is valid. This assumption is reasonable when np0q0 ≥ 5 (Rosner, 2015, page 249).
%>%
students freq_table(male) %>%
# Test for equal proportions
summarise(
p_bar = n[1] / n_total[1],
p0 = 0.5,
n = n_total[1],
z = abs((p_bar - p0)) / sqrt(p0 * (1 - p0) / n),
p = 2 * pnorm(z, lower.tail=FALSE)
)#> # A tibble: 1 × 5
#> p_bar p0 n z p
#> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 0.45 0.5 20 0.447 0.655
We can also get this value from R’s built-in prop.test
and R’s built-in chisq.test
.
prop.test(9, 20, p = 0.5, correct = FALSE)
#>
#> 1-sample proportions test without continuity correction
#>
#> data: 9 out of 20, null probability 0.5
#> X-squared = 0.2, df = 1, p-value = 0.6547
#> alternative hypothesis: true p is not equal to 0.5
#> 95 percent confidence interval:
#> 0.2581979 0.6579147
#> sample estimates:
#> p
#> 0.45
chisq.test(c(9, 11), p = c(0.5, 0.5), correct = FALSE)
#>
#> Chi-squared test for given probabilities
#>
#> data: c(9, 11)
#> X-squared = 0.2, df = 1, p-value = 0.6547
Finally, I’ve built this test into freqtables::freq_test
, which is designed to work in a dplyr
pipeline.
%>%
students freq_table(male) %>%
freq_test()
#> # A tibble: 2 × 14
#> var cat n n_total percent se t_crit lcl ucl n_expected
#> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 male 0 9 20 45 11.4 2.09 23.8 68.2 10
#> 2 male 1 11 20 55 11.4 2.09 31.8 76.2 10
#> # … with 4 more variables: chi2_contrib <dbl>, chi2_pearson <dbl>, df <dbl>,
#> # p_chi2_pearson <dbl>
We can interpret this result in the following way: The probability of getting data that conflict with the tested hypothesis as much as or more than the observed data conflict with the tested hypothesis is 0.65, if the tested hypothesis is correct and all other assumptions used in computing the p-value are correct. These data are compatible with the null hypothesis of equal proportions of males and females on our hypothetical campus.
The study subjects may be treated as a simple random sample from a population of similar subjects (Daniel, 2013, page 258).
The sampling distribution of p_hat is approximately normally distributed in accordance with the central limit theorem (Daniel, 2013, page 258).
Here, and below, will will use the mtcars data for ease of comparison.
%>%
mtcars freq_table(am) %>%
freq_test() %>%
select(1:6, p_chi2_pearson)
#> var cat n n_total percent se p_chi2_pearson
#> 1 am 0 19 32 59.375 8.820997 0.2888444
#> 2 am 1 13 32 40.625 8.820997 0.2888444
These results match the results obtained from Stata and SAS (see below).
References:
Daniel, W. W., & Cross, C. L. (2013). Biostatistics: A foundation for analysis in the health sciences (Tenth). Hoboken, NJ: Wiley.
Kaplan, D. T. (2017). Statistical modeling: A fresh approach (Second). Project MOSAIC Books.
Rosner, B. (2015). Fundamentals of biostatistics (Eighth). MA: Cengage Learning.
Rothman, K. J., Greenland, S., & Lash, T. L. (2008). Modern epidemiology (Third). Philadelphia, PA: Lippincott Williams & Wilkins.
In this example we are still testing for equality of proportions. However, now we are testing for equality across more than two categories.
Notice that the syntax doesn’t change. In fact, there isn’t even an additional method needed for this scenario.
%>%
mtcars freq_table(cyl) %>%
freq_test()
#> var cat n n_total percent se t_crit lcl ucl n_expected
#> 1 cyl 4 11 32 34.375 8.530513 2.039513 19.49961 53.11130 10.66667
#> 2 cyl 6 7 32 21.875 7.424859 2.039513 10.34883 40.44691 10.66667
#> 3 cyl 8 14 32 43.750 8.909831 2.039513 27.09672 61.94211 10.66667
#> chi2_contrib chi2_pearson df p_chi2_pearson
#> 1 0.01041667 2.3125 2 0.314664
#> 2 1.26041667 2.3125 2 0.314664
#> 3 1.04166667 2.3125 2 0.314664
This is the same result we get from SAS. I was unable to find an easy way to do this test in Stata.
Now, suppose we are interested in the relationship between two categorical variables. More precisely, we can estimate the degree of compatibility between our observed values, and the values we would expect to observe if the two characteristics we are analyzing were independent (i.e., knowing the value of one characteristic tells you nothing about the value of the other characteristic).
For example, earlier we found that our data were compatible with the hypothesis that there are equal proportions of females and males on our hypothetical campus – our superpopulation. Now, we suspect that females and males in our superpopulation differ with respect to their binge drinking behavior.
To investigate this hypothesis further, we will collect data from a simple random sample of students, measure their gender, and measure whether they binge drank in the last 30 days or not (yes/no). Then we will use the chi-square test of independence to draw conclusions about the degree of compatibility between our data, and the data we would expect to observe if binge drinking and gender were independent of one another.
set.seed(456)
<- tibble(
students male = rbinom(20, 1, 0.5),
binge = if_else(male == 0,
rbinom(20, 1, 0.10), # 10% of females binge drink
rbinom(20, 1, 0.30)) # 30% of males binge drink
)
%>%
students freq_table(male, binge) %>%
freq_test()
#> One or more expected cell counts are <= 5. Therefore, Fisher's Exact Test was used.
#> # A tibble: 4 × 26
#> row_var row_cat col_var col_cat n n_row n_total percent_total se_total
#> <chr> <chr> <chr> <chr> <int> <int> <int> <dbl> <dbl>
#> 1 male 0 binge 0 8 11 20 40 11.2
#> 2 male 0 binge 1 3 11 20 15 8.19
#> 3 male 1 binge 0 7 9 20 35 10.9
#> 4 male 1 binge 1 2 9 20 10 6.88
#> # … with 17 more variables: t_crit_total <dbl>, lcl_total <dbl>,
#> # ucl_total <dbl>, percent_row <dbl>, se_row <dbl>, t_crit_row <dbl>,
#> # lcl_row <dbl>, ucl_row <dbl>, n_col <int>, n_expected <dbl>,
#> # chi2_contrib <dbl>, chi2_pearson <dbl>, r <int>, c <int>, df <dbl>,
#> # p_chi2_pearson <dbl>, p_fisher <dbl>
There are a couple things worth mentioning here. First, 27% of the females in our sample engaged in binge drinking, compared to 22% of males – even though we know that the underlying probability in our superpopulation was 10% for females and 30% for males. This discrepancy highlights the potential for sampling variability, and small sample estimates, to be misleading.
Additionally, because some expected cell counts were less than 5, we should estimate our p-value using Fisher’s exact method.
%>%
students freq_table(male, binge) %>%
freq_test()
#> One or more expected cell counts are <= 5. Therefore, Fisher's Exact Test was used.
#> # A tibble: 4 × 26
#> row_var row_cat col_var col_cat n n_row n_total percent_total se_total
#> <chr> <chr> <chr> <chr> <int> <int> <int> <dbl> <dbl>
#> 1 male 0 binge 0 8 11 20 40 11.2
#> 2 male 0 binge 1 3 11 20 15 8.19
#> 3 male 1 binge 0 7 9 20 35 10.9
#> 4 male 1 binge 1 2 9 20 10 6.88
#> # … with 17 more variables: t_crit_total <dbl>, lcl_total <dbl>,
#> # ucl_total <dbl>, percent_row <dbl>, se_row <dbl>, t_crit_row <dbl>,
#> # lcl_row <dbl>, ucl_row <dbl>, n_col <int>, n_expected <dbl>,
#> # chi2_contrib <dbl>, chi2_pearson <dbl>, r <int>, c <int>, df <dbl>,
#> # p_chi2_pearson <dbl>, p_fisher <dbl>
The large p-value (p = 1.0) calculated using Fisher’s exact method indicates that the data are highly consistent with the null hypothesis of statistical independence between gender and binge drinking at our hypothetical campus, assuming that the null hypothesis is true and the study is free of bias. However, wide range of the 95% confidence intervals makes the precision of the point estimates doubtful. Another study using larger sample size is needed to get a more precise estimate of the relationship between gender and binge drinking.
Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 818). SAGE Publications. Kindle Edition.
Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 818). SAGE Publications. Kindle Edition.
%>%
mtcars freq_table(am, vs) %>%
freq_test() %>%
select(1:8, p_chi2_pearson)
#> # A tibble: 4 × 9
#> row_var row_cat col_var col_cat n n_row n_total percent_total
#> <chr> <chr> <chr> <chr> <int> <int> <int> <dbl>
#> 1 am 0 vs 0 12 19 32 37.5
#> 2 am 0 vs 1 7 19 32 21.9
#> 3 am 1 vs 0 6 13 32 18.8
#> 4 am 1 vs 1 7 13 32 21.9
#> # … with 1 more variable: p_chi2_pearson <dbl>
%>%
mtcars freq_table(am, cyl) %>%
freq_test()
#> One or more expected cell counts are <= 5. Therefore, Fisher's Exact Test was used.
#> # A tibble: 6 × 26
#> row_var row_cat col_var col_cat n n_row n_total percent_total se_total
#> <chr> <chr> <chr> <chr> <int> <int> <int> <dbl> <dbl>
#> 1 am 0 cyl 4 3 19 32 9.38 5.24
#> 2 am 0 cyl 6 4 19 32 12.5 5.94
#> 3 am 0 cyl 8 12 19 32 37.5 8.70
#> 4 am 1 cyl 4 8 13 32 25 7.78
#> 5 am 1 cyl 6 3 13 32 9.38 5.24
#> 6 am 1 cyl 8 2 13 32 6.25 4.35
#> # … with 17 more variables: t_crit_total <dbl>, lcl_total <dbl>,
#> # ucl_total <dbl>, percent_row <dbl>, se_row <dbl>, t_crit_row <dbl>,
#> # lcl_row <dbl>, ucl_row <dbl>, n_col <int>, n_expected <dbl>,
#> # chi2_contrib <dbl>, chi2_pearson <dbl>, r <int>, c <int>, df <dbl>,
#> # p_chi2_pearson <dbl>, p_fisher <dbl>
There is one problem with the chi-square test, which is that the sampling distribution of the test statistic has an approximate chi-square distribution. The larger the sample is, the better this approximation becomes, and in large samples the approximation is good enough to not worry about the fact that it is an approximation. However, in small samples the approximation is not good enough, making significance tests of the chi-square distribution inaccurate.
Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 816). SAGE Publications. Kindle Edition.
%>%
mtcars freq_table(am, cyl) %>%
freq_test()
#> One or more expected cell counts are <= 5. Therefore, Fisher's Exact Test was used.
#> # A tibble: 6 × 26
#> row_var row_cat col_var col_cat n n_row n_total percent_total se_total
#> <chr> <chr> <chr> <chr> <int> <int> <int> <dbl> <dbl>
#> 1 am 0 cyl 4 3 19 32 9.38 5.24
#> 2 am 0 cyl 6 4 19 32 12.5 5.94
#> 3 am 0 cyl 8 12 19 32 37.5 8.70
#> 4 am 1 cyl 4 8 13 32 25 7.78
#> 5 am 1 cyl 6 3 13 32 9.38 5.24
#> 6 am 1 cyl 8 2 13 32 6.25 4.35
#> # … with 17 more variables: t_crit_total <dbl>, lcl_total <dbl>,
#> # ucl_total <dbl>, percent_row <dbl>, se_row <dbl>, t_crit_row <dbl>,
#> # lcl_row <dbl>, ucl_row <dbl>, n_col <int>, n_expected <dbl>,
#> # chi2_contrib <dbl>, chi2_pearson <dbl>, r <int>, c <int>, df <dbl>,
#> # p_chi2_pearson <dbl>, p_fisher <dbl>
Interpret
Compare to Stata and SAS
Pearson’s chi-square is given by:
\[\chi^2=\sum \frac{(observed_{ij} - expected_{ij})^2}{expected_{ij}}\]
where,
\[expected_{ij} = \frac{row \ total * column \ total}{total}\]
For example:
%>%
mtcars freq_table(am, vs) %>%
freq_test() %>%
select(1:4, n_col, n_total, n_expected:chi2_pearson)
#> # A tibble: 4 × 9
#> row_var row_cat col_var col_cat n_col n_total n_expected chi2_contrib
#> <chr> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
#> 1 am 0 vs 0 18 32 10.7 0.161
#> 2 am 0 vs 1 14 32 8.31 0.207
#> 3 am 1 vs 0 18 32 7.31 0.236
#> 4 am 1 vs 1 14 32 5.69 0.303
#> # … with 1 more variable: chi2_pearson <dbl>
After obtaining the Pearson chi-squared value (0.9068826 above), we compare it to the critical value for the chi-square distribution with degrees of freedom = (rows - 1)(columns - 1) = 1. If the observed value is bigger than the critical value, we conclude that there is a statistically significant relationship.
For example:
%>%
mtcars freq_table(am, vs) %>%
freq_test() %>%
select(1:3, chi2_pearson:p_chi2_pearson)
#> # A tibble: 4 × 8
#> row_var row_cat col_var chi2_pearson r c df p_chi2_pearson
#> <chr> <chr> <chr> <dbl> <int> <int> <dbl> <dbl>
#> 1 am 0 vs 0.907 2 2 1 0.341
#> 2 am 0 vs 0.907 2 2 1 0.341
#> 3 am 1 vs 0.907 2 2 1 0.341
#> 4 am 1 vs 0.907 2 2 1 0.341
And this is the same result we get from R’s built-in chi-square test.
chisq.test(mtcars$am, mtcars$vs, correct = FALSE)
#>
#> Pearson's Chi-squared test
#>
#> data: mtcars$am and mtcars$vs
#> X-squared = 0.90688, df = 1, p-value = 0.3409
However, our result has the following advantages (in my opinion):
It let’s us see how the result was obtained - giving us a better intuition about what our result really means, and how it may differ if we had collected a different sample.
If fits easily into a dplyr pipeline.
It results in tibble, rather than a list.
When you have a 2 × 2 contingency table (i.e., two categorical variables each with two categories) then Pearson’s chi-square tends to produce significance values that are too small (in other words, it tends to make a Type I error). Therefore, Yates suggested a correction to the Pearson formula (usually referred to as Yates’s continuity correction). The basic idea is that when you calculate the deviation from the model (the observedij – modelij in equation (18.1)) you subtract 0.5 from the absolute value of this deviation before you square it. In plain English, this means you calculate the deviation, ignore whether it is positive or negative, subtract 0.5 from the value and then square it.
The key thing to note is that it lowers the value of the chi-square statistic and, therefore, makes it less significant. Although this seems like a nice solution to the problem there is a fair bit of evidence that this overcorrects and produces chi-square values that are too small! Howell (2006) provides an excellent discussion of the problem with Yates’s correction for continuity, if you’re interested; all I will say is that, although it’s worth knowing about, it’s probably best ignored.
Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 817). SAGE Publications. Kindle Edition.
There is one problem with the chi-square test, which is that the sampling distribution of the test statistic has an approximate chi-square distribution. The larger the sample is, the better this approximation becomes, and in large samples the approximation is good enough to not worry about the fact that it is an approximation. However, in small samples the approximation is not good enough, making significance tests of the chi-square distribution inaccurate. This is why you often read that to use the chi-square test the expected frequencies in each cell must be greater than 5.
Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 816). SAGE Publications. Kindle Edition.
Independence of observations. Cannot use chi-square test on repeated or nested measures.
The expected frequencies should be greater than 5. Expected frequencies below 5 results in a loss of statistical power, and may fail to detect genuine effects (Type II error).
Fisher’s exact test (Fisher, 1922) is not so much a test as a way of computing the exact probability of a statistic. It was designed originally to overcome the problem that with small samples the sampling distribution of the chi-square statistic deviates substantially from a chi-square distribution. It should be used with small samples.
Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 918). SAGE Publications. Kindle Edition.
Fisher’s exact test is computationally intensive. In order to get a feel for how it works, I’ve replicated a simple example from Wolfram MathWorld below.
In this example, X is a journal, say either Mathematics Magazine or Science, and Y is the number of articles on the topics of mathematics and biology appearing in a given issue of one of these journals. If Mathematics Magazine has five articles on math and one on biology, and Science has none on math and four on biology, then the relevant matrix would be:
<- matrix(c(5, 0, 1, 4), nrow = 2, byrow = TRUE)
m_observed
m_observed#> [,1] [,2]
#> [1,] 5 0
#> [2,] 1 4
Then calculate the conditional probability of getting the actual matrix given the particular row and column sums using the multivariate generalization of the hypergeometric probability function, which can be calculated as:
<- function(m) {
p
# Calculate the marginal totals
<- sum(m[1, ])
r1 <- sum(m[2, ])
r2 <- sum(m[, 1])
c1 <- sum(m[, 2])
c2 <- sum(m)
tot
# Calculate the conditional probability of getting the actual matrix given
# the particular row and column sums
<- factorial(r1)
r1_fac <- factorial(r2)
r2_fac <- factorial(c1)
c1_fac <- factorial(c2)
c2_fac <- factorial(tot)
tot_fac
<- factorial(m) # factorial of each cell of the matrix (i.e., 5!, 0!, 1!, 4!)
m_fac <- prod(m_fac) # Multiply all of those values together
prod_m_fac
<- r1_fac * r2_fac * c1_fac * c2_fac
numerator <- tot_fac * (prod_m_fac)
denominator <- numerator / denominator
p
# Return p
p }
So, in the example above, the p value for our observed data (p_cutoff) equals:
<- p(m_observed)
p_cutoff
p_cutoff#> [1] 0.02380952
Next, we need to calculate the p value for all other combinations of values that could have existed given the marginal totals in our observed matrix “m”.
For example,
4 1
2 3
would give us the same marginal totals (i.e., 5, 5, 6, 4)
Below I calculate the p for each possible combination, and save them into a vector for use in the next step. In this case, the possible combinations are given on the Wolfram Mathworks website. I’m still searching for an efficient algorithm that will find these combinations.
# Create a empty vector to hold the p values created in the loop below
<- vector(mode = "numeric")
p_values
# Create a list of all possible combinations, given our margins
<- list(
combinations c(4, 1 ,2, 3),
c(3, 2, 3, 2),
c(2, 3, 4, 1),
c(1, 4, 5, 0)
)
# Calculate the p value for each combination and save it
for (i in seq_along(combinations)) {
# Turn into a matrix
<- matrix(combinations[[i]], nrow = 2, byrow = TRUE)
m
# Caclulate p
<- p(m)
p_val
# Save p to vector
<- c(p_values, p_val)
p_values }
These p-values should all sum to 1
sum(p_values, p_cutoff)
#> [1] 1
Next, the P-value of the test can be simply computed by the sum of all P-values which are <= p_cutoff (including p_cutoff itself).
<- tibble(
df p_value = p_values,
p_cutoff = p_cutoff
%>%
) mutate(
less_or_equal = p_value <= p_cutoff
%>%
) print()
#> # A tibble: 4 × 3
#> p_value p_cutoff less_or_equal
#> <dbl> <dbl> <lgl>
#> 1 0.238 0.0238 FALSE
#> 2 0.476 0.0238 FALSE
#> 3 0.238 0.0238 FALSE
#> 4 0.0238 0.0238 TRUE
%>%
df filter(less_or_equal) %>%
summarise(
`p-value` = sum(p_value, p_cutoff)
)#> # A tibble: 1 × 1
#> `p-value`
#> <dbl>
#> 1 0.0476
And for comparison, let’s check our results against R’s built-in function for Fisher’s Exact Test.
fisher.test(m_observed)
#>
#> Fisher's Exact Test for Count Data
#>
#> data: m_observed
#> p-value = 0.04762
#> alternative hypothesis: true odds ratio is not equal to 1
#> 95 percent confidence interval:
#> 1.024822 Inf
#> sample estimates:
#> odds ratio
#> Inf