Deady (2004) studied naive mock-jurors’ responses to two types of vertic setups. One has two options only (guilt vs. acquittal) and the other has an additional option “not proven”. The researcher also investigated the influence of conflicting testimonial evidence on judgments. The study had 2 (two vs. three-option verdict) by 2 (conflict vs. no-conflict conditions) factorial design. The dependent variable was the juror’s degree of confidence in her or his verdict, expressed as a percentage rating (0–100). The data included the responses from 104 first-year psychology students at The Australian National University.
The data included three variables.
crc99: The ratings of confidence levels with rescaling
into the (0, 1) interval to avoid 1 and 0 values.vert: was the dummy variable for coding the conditions
of verdict types, whereas
verdict = -1 indicates responses from the two-option
verdict condition andverdict = 1 indicates responses from the three-option
verdict condition.confl: was the dummy variable for coding the conflict
conditions, whereas
confl = -1 indicates the no-conflict condition andconfl = 1 indicates the conflict condition.library(cdfquantreg)
data(cdfqrExampleData)
#Quick overview of the variables
rbind(head(JurorData,4),tail(JurorData,4))##     vert confl crc99
## 1     -1    -1 0.500
## 2     -1    -1 0.698
## 3     -1    -1 0.797
## 4     -1    -1 0.698
## 101    1     1 0.946
## 102    1     1 0.698
## 103    1     1 0.995
## 104    1     1 0.847The hypothesis was that the conflicting evidence would lower juror confidence and that the three-option condition would increase it.
That is, it was expected that there would be significant interaction
effect between vert and confl when predicting
crc99. To test the hypothesis, we fit a a model in
cdfquantreg by using the T2-T2 distribution as a demonstration.
# We use T2-T2 distribution
fd <- "t2" # The parent distribution
sd <- "t2" # The child distribution
# Fit the null model
fit_null <- cdfquantreg(crc99 ~ 1 | 1, fd, sd, data = JurorData)
# Fit a main effect model
fit1 <- cdfquantreg(crc99 ~ vert + confl | 1, fd, sd, data = JurorData)
# Fit the full model
fit2 <- cdfquantreg(crc99 ~ vert*confl | 1, fd, sd, data = JurorData)
anova(fit1,fit2)## Likelihood ratio tests 
## 
##   Resid. Df -2Loglik Df LR stat Pr(>Chi)
## 1       100  -53.625                    
## 2        99  -54.527  1 0.90193   0.3423# Obtain the statistics for the null model
summary(fit2)## Family:  t2 t2 
## Call:  cdfquantreg(formula = crc99 ~ vert * confl | 1, data = JurorData,  
##     fd = fd, sd = sd) 
## 
## Mu coefficients (Location submodel)
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.76694    0.09919   7.732 1.07e-14 ***
## vert         0.09942    0.09778   1.017    0.309    
## confl        0.07937    0.09778   0.812    0.417    
## vert:confl   0.09335    0.09778   0.955    0.340    
## 
## Sigma coefficients (Dispersion submodel)
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.2258     0.1209  -1.868   0.0618 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Converge:  successful completion
## Log-Likelihood:  27.2635 
## 
## Gradient:  3e-04 0 -3e-04 0 -3e-04The results were similar to the ones outlined in Smithson &
Verkuilen (2006). There was no significant interaction between
vert and confl in the location submodel.
Next, we replicate the steps in Smithson & Verkuilen by fitting
models with the dispersion submodel, expanding the formula to
crc99 ~ vert*confl |vert + confl and
crc99 ~ vert*confl |vert*confl.
# Fit a main effect model
fit3 <- cdfquantreg(crc99 ~ vert*confl |vert + confl, fd, sd, data = JurorData)
# Fit the full model
fit4 <- cdfquantreg(crc99 ~ vert*confl |vert*confl, fd, sd, data = JurorData)
anova(fit2, fit3, fit4)## Likelihood ratio tests 
## 
##   Resid. Df -2Loglik Df LR stat Pr(>Chi)  
## 1        99  -54.527                      
## 2        97  -60.958  2  6.4308  0.04014 *
## 3        96  -61.746  1  0.7878  0.37475  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1# Obtain the statistics for the null model
summary(fit4)## Family:  t2 t2 
## Call:  
## cdfquantreg(formula = crc99 ~ vert * confl | vert * confl, data = JurorData,  
##     fd = fd, sd = sd) 
## 
## Mu coefficients (Location submodel)
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.77441    0.10998   7.041 1.91e-12 ***
## vert         0.14261    0.10998   1.297    0.195    
## confl        0.07258    0.10998   0.660    0.509    
## vert:confl   0.08055    0.10998   0.732    0.464    
## 
## Sigma coefficients (Dispersion submodel)
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -0.20126    0.12122  -1.660   0.0969 .
## vert         0.30779    0.12122   2.539   0.0111 *
## confl       -0.03512    0.12122  -0.290   0.7720  
## vert:confl  -0.10766    0.12122  -0.888   0.3745  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Converge:  successful completion
## Log-Likelihood:  30.8728 
## 
## Gradient:  4e-04 -5e-04 -1e-04 0 -1e-04 -1e-04 -1e-04 -1e-04As shown in the results, the number of options had significant effect on the dispersion of the distribution. While whether the evidence is conflict or not does not influence the dispersion. There was no significant interaction between the two predictors.
The following figures shows how well the model fit the actual response distributions.
# Compare the empirical distribution and the fitted values distribution
breaks <- seq(0,1,length.out =11)
plot(fit4,xlim = c(0.1,1),ylim = c(0,3), breaks = breaks)We can check the fitted values as well as model residuals. Currently three types of residuals are available: raw residuals, Pearson’s residuals and deviance residuals.
par(mfrow=c(2,2),mar = c(2,3,2,2))
# Plot the fitted values
plot(fitted(fit4, "full"), main = "Fitted Values")
# Check Residuals
plot(residuals(fit4, "raw"), main = "Raw Residuals")
plot(residuals(fit4, "pearson"), main = "Pearson Residuals")
plot(residuals(fit4, "deviance"), main = "Deviance Residuals")A second example is a data from a study that investigates the relationship between stress and anxiety. The data includes 166 nonclinical women in Townsville, Queensland, Australia. The variables are linearly transformed scales from the Depression Anxiety Stress Scales, which normally range from 0 to 42.
The scores of the two continuous variables Anxiety and
Stress were firstly linearly transformed to the (0, 1)
interval. The figure below shows kernel density estimates for the two
variables. As one would expect in a nonclinical population, there is an
active floor for each variable, with this being most pronounced for
anxiety. It should be clear that anxiety is strongly skewed.
head(AnxStrData, 8)##   Anxiety Stress
## 1    0.01   0.01
## 2    0.17   0.29
## 3    0.01   0.17
## 4    0.05   0.41
## 5    0.09   0.21
## 6    0.41   0.45
## 7    0.05   0.21
## 8    0.01   0.01plot(density(AnxStrData$Anxiety), main = "Anxiety and Stress")
lines(density(AnxStrData$Stress), lty = 2)Heteroscedasticity is to be expected, as theoretically it is likely that someone experiencing anxiety will also experience a great deal of stress, but the converse is not true, because many people might experience stress without being anxious. That is, stress could be thought of as a necessary but not sufficient condition for anxiety. If so, the relationship between the variables is not one-to-one plus noise, as predicted by ordinary homoscedastic regression, but instead is many-to-one plus noise, which involves heteroscedasticity.
We can fit the cdf model the same way as we did for Example data set 1.
# Fit the null model
fit_null <- cdfquantreg(Anxiety ~ 1 | 1, fd, sd, data = AnxStrData)
# Fit the location model
fit1 <- cdfquantreg(Anxiety ~ Stress | 1, fd, sd, data = AnxStrData)
# Fit the full model
fit2 <- cdfquantreg(Anxiety ~ Stress | Stress, fd, sd, data = AnxStrData)
anova(fit_null,fit1, fit2)## Likelihood ratio tests 
## 
##   Resid. Df -2Loglik Df LR stat Pr(>Chi)    
## 1       164  -496.90                        
## 2       163  -615.08  1 118.177   <2e-16 ***
## 3       162  -617.03  1   1.949   0.1627    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1summary(fit2)## Family:  t2 t2 
## Call:  cdfquantreg(formula = Anxiety ~ Stress | Stress, data = AnxStrData,  
##     fd = fd, sd = sd) 
## 
## Mu coefficients (Location submodel)
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -7.3397     0.2145  -34.22   <2e-16 ***
## Stress       10.7807     0.7845   13.74   <2e-16 ***
## 
## Sigma coefficients (Dispersion submodel)
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   0.2673     0.1742   1.535    0.125
## Stress        0.7715     0.5517   1.398    0.162
## 
## Converge:  successful completion
## Log-Likelihood:  308.5143 
## 
## Gradient:  0 0 1e-04 0It is clear that Stress influenced both the mean and the
dispersion of scores on anxiety.
# Compare the empirical distribution and the fitted values distribution
plot(fit2)# Plot the fitted values
plot(fitted(fit2, "full"))
# Check Residuals
plot(residuals(fit2, "raw"))