Here we focus on the specifications of the GSPCR model. Three
arguments of the cv_gspcr()
should be specified
carefully:
In this vignette we consider a simple scenario with a continuous
dependent variable and a set of continuous predictors. First, we load
the required packages and store the example dataset
GSPCRexdata
(see the helpfile for details
?GSPCRexdata
) in two separate objects:
# Load R packages
library(gspcr) # this package!
library(superpc) # alternative comparison package
library(patchwork) # combining ggplots
# Comment goal of code
X <- GSPCRexdata$X$cont
y <- GSPCRexdata$y$cont
As described in the introduction, gspcr
allows for the
specification of different bivariate association
measures. We can run gspcr
using as a threshold
type:
superpc
R package.Another important aspect to consider is the number of
threshold values that should be considered. This can be
specified with the nthrs
argument. Using the following code
we can compare the solution paths obtained by the different association
measures and values for a given number of PCs.
# Define a vector of threshold types
threshold_types <- c("LLS", "normalized", "PR2")
# Train the GSPCR model with the different values
out_trhs <- lapply(
X = threshold_types,
FUN = function(i) {
cv_gspcr(
dv = y,
ivs = X,
thrs = i, # threshold type
nthrs = 20, # number of threshold values
npcs_range = 1,
K = 10
)
}
)
# Plot them
plots <- lapply(out_trhs, function(i) {
plot(
x = i,
y = "F",
labels = FALSE, # We are using a single nPC, do not need the label
discretize = FALSE, # Makes X-axis more readable
print = FALSE
)
})
# Patchwork ggplots
plots[[1]] + plots[[2]] + plots[[3]]
As you can see, the solution paths are similar, although LLS tended to favor lower threshold values.
We can use different cross-validation fit measures.
See the help file for the list options (?cv_gspcr
).
# Measures
fit_measure_vec <- c("LRT", "PR2", "MSE", "F", "AIC", "BIC")
# Train the GSPCR model with the different values
out_fit_meas <- lapply(fit_measure_vec, function(i) {
cv_gspcr(
dv = y,
ivs = X,
fit_measure = i,
thrs = "normalized",
nthrs = 20,
npcs_range = 1,
K = 10
)
})
# Plot them
plots <- lapply(seq_along(fit_measure_vec), function(i) {
# Reverse y?
rev <- grepl("MSE|AIC|BIC", fit_measure_vec[i])
# Make plots
plot(
x = out_fit_meas[[i]],
y = fit_measure_vec[[i]],
labels = FALSE,
y_reverse = rev,
errorBars = FALSE,
discretize = FALSE,
print = FALSE
)
})
# Patchwork ggplots
(plots[[1]] + plots[[2]] + plots[[3]]) / (plots[[4]] + plots[[5]] + plots[[6]])
As you can see, the different fit measures return equivalent solution paths. This is true for any number of PCs:
# Train the GSPCR model with the different values
out_fit_meas <- lapply(fit_measure_vec, function(i) {
cv_gspcr(
dv = y,
ivs = X,
fit_measure = i,
thrs = "normalized",
nthrs = 20,
npcs_range = 5,
K = 10
)
})
# Plot them
plots <- lapply(seq_along(fit_measure_vec), function(i) {
# Reverse y?
rev <- grepl("MSE|AIC|BIC", fit_measure_vec[i])
# Make plots
plot(
x = out_fit_meas[[i]],
y = fit_measure_vec[[i]],
labels = FALSE,
y_reverse = rev,
errorBars = FALSE,
discretize = FALSE,
print = FALSE
)
})
# Patchwork ggplots
(plots[[1]] + plots[[2]] + plots[[3]]) / (plots[[4]] + plots[[5]] + plots[[6]])
We can use cross-validation to select the number of
PCs as well. We can use the npcs_range
argument to
specify the range of the number of PCs to consider.
# Train the model
out_npcs <- cv_gspcr(
dv = y,
ivs = X,
npcs_range = c(2, 5, 10)
)
# Plot solution paths
plot(out_npcs)
Given the choice of 2, 5, or 10 PCs, we would use 2 PCs with the second threshold value.