Here we focus on the specifications of the GSPCR model. Three
arguments of the cv_gspcr() should be specified
carefully:
In this vignette we consider a simple scenario with a continuous
dependent variable and a set of continuous predictors. First, we load
the required packages and store the example dataset
GSPCRexdata (see the helpfile for details
?GSPCRexdata) in two separate objects:
# Load R packages
library(gspcr) # this package!
library(superpc) # alternative comparison package
library(patchwork) # combining ggplots
# Comment goal of code
X <- GSPCRexdata$X$cont
y <- GSPCRexdata$y$contAs described in the introduction, gspcr allows for the
specification of different bivariate association
measures. We can run gspcr using as a threshold
type:
superpc
R package.Another important aspect to consider is the number of
threshold values that should be considered. This can be
specified with the nthrs argument. Using the following code
we can compare the solution paths obtained by the different association
measures and values for a given number of PCs.
# Define a vector of threshold types
threshold_types <- c("LLS", "normalized", "PR2")
# Train the GSPCR model with the different values
out_trhs <- lapply(
    X = threshold_types,
    FUN = function(i) {
        cv_gspcr(
            dv = y,
            ivs = X,
            thrs = i,       # threshold type
            nthrs = 20,     # number of threshold values
            npcs_range = 1, 
            K = 10
        )
    }
)
# Plot them
plots <- lapply(out_trhs, function(i) {
    plot(
        x = i,
        y = "F",
        labels = FALSE,     # We are using a single nPC, do not need the label
        discretize = FALSE, # Makes X-axis more readable
        print = FALSE
    )
})
# Patchwork ggplots
plots[[1]] + plots[[2]] + plots[[3]]Figure 1: Solution paths for different association measures.
As you can see, the solution paths are similar, although LLS tended to favor lower threshold values.
We can use different cross-validation fit measures.
See the help file for the list options (?cv_gspcr).
# Measures
fit_measure_vec <- c("LRT", "PR2", "MSE", "F", "AIC", "BIC")
# Train the GSPCR model with the different values
out_fit_meas <- lapply(fit_measure_vec, function(i) {
    cv_gspcr(
        dv = y,
        ivs = X,
        fit_measure = i,
        thrs = "normalized",
        nthrs = 20,
        npcs_range = 1,
        K = 10
    )
})
# Plot them
plots <- lapply(seq_along(fit_measure_vec), function(i) {
    # Reverse y?
    rev <- grepl("MSE|AIC|BIC", fit_measure_vec[i])
    # Make plots
    plot(
        x = out_fit_meas[[i]],
        y = fit_measure_vec[[i]],
        labels = FALSE,
        y_reverse = rev,
        errorBars = FALSE,
        discretize = FALSE,
        print = FALSE
    )
})
# Patchwork ggplots
(plots[[1]] + plots[[2]] + plots[[3]]) / (plots[[4]] + plots[[5]] + plots[[6]])Figure 2: Solution paths for different fit measures.
As you can see, the different fit measures return equivalent solution paths. This is true for any number of PCs:
# Train the GSPCR model with the different values
out_fit_meas <- lapply(fit_measure_vec, function(i) {
    cv_gspcr(
        dv = y,
        ivs = X,
        fit_measure = i,
        thrs = "normalized",
        nthrs = 20,
        npcs_range = 5,
        K = 10
    )
})
# Plot them
plots <- lapply(seq_along(fit_measure_vec), function(i) {
    # Reverse y?
    rev <- grepl("MSE|AIC|BIC", fit_measure_vec[i])
    # Make plots
    plot(
        x = out_fit_meas[[i]],
        y = fit_measure_vec[[i]],
        labels = FALSE,
        y_reverse = rev,
        errorBars = FALSE,
        discretize = FALSE,
        print = FALSE
    )
})
# Patchwork ggplots
(plots[[1]] + plots[[2]] + plots[[3]]) / (plots[[4]] + plots[[5]] + plots[[6]])Figure 3: Solution paths for different fit measures when using 5 PCs.
We can use cross-validation to select the number of
PCs as well. We can use the npcs_range argument to
specify the range of the number of PCs to consider.
# Train the model
out_npcs <- cv_gspcr(
    dv = y,
    ivs = X,
    npcs_range = c(2, 5, 10)
)
# Plot solution paths
plot(out_npcs)Figure 4: Solution paths for different fit measures when cross-validating the number of PCs.
Given the choice of 2, 5, or 10 PCs, we would use 2 PCs with the second threshold value.