where \(\sum_{i=1}^{k} \pi_{i} = 1\). The \(\pi_{i}\) are mixing parameters which determine the weight of each component distribution \(f_{Y_{i}}(y)\) in the overall probability distribution. As long as each component has a valid PDF, the overall distribution \(h_{Y}(y)\) has a valid PDF. The main assumption is statistical independence between the process of randomly selecting the component distribution and the distributions themselves. Assume there is a random selection process that first generates the numbers \(1,\ ...,\ k\) with probabilities \(\pi_{1},\ ...,\ \pi_{k}\). After selecting number \(i\), where \(1 \leq i \leq k\), a random variable \(y\) is drawn from component distribution \(f_{Y_{i}}(y)\) (Davenport, Bezder, and Hathaway 1988; Schork, Allison, and Thiel 1996; Everitt 1996; Pearson 2011).
Continuous mixture variables are generated at the component level in SimCorrMix. The target correlation matrix rho
in the simulation functions corrvar
and corrvar2
is specified in terms of the correlations with components of continuous mixture variables. This allows the user to set the correlation between components of the same mixture variable to any desired value. If this correlation is set to \(0\), the intermediate correlation matrix Sigma
may need to be converted to the nearest positive-definite matrix. This is done within the simulation functions by specifying use.nearPD = TRUE
. Higham (2002)’s algorithm is executed with the Matrix::nearPD
function (Bates and Maechler 2018). Otherwise, negative eigenvalues are replaced with \(0\).
Then the first six cumulants are given by (Kendall and Stuart 1977):
\[\begin{align} \begin{split} {\kappa}_{1} &= {\mu}_{1} = 0\ \ \ \ \ \ & &{\kappa}_{4} &= {\mu}_{4} - 3{{\mu}_{2}}^{2} \\ {\kappa}_{2} &= {\mu}_{2} & &{\kappa}_{5} &= {\mu}_{5} - 10{\mu}_{3}{\mu}_{2} \\ {\kappa}_{3} &= {\mu}_{3} & &{\kappa}_{6} &= {\mu}_{6} - 15{\mu}_{4}{\mu}_{2} - 10{{\mu}_{3}}^{2} + 30{{\mu}_{2}}^. \end{split} \tag{1.2} \end{align}\]The cumulants are standardized by dividing \({\kappa}_{1}\) - \({\kappa}_{6}\) by \(\sqrt{{{\kappa}_{2}}^{r}} = ({\sigma}^{2})^{r/2} = {\sigma}^{r}\), where \({\sigma}^{2}\) is the variance of \(Y\) and \(r\) is the order of the cumulant:
\[\begin{align} \begin{split} 0 &= \frac{{\kappa}_{1}}{\sqrt{{{\kappa}_{2}}^{1}}} = \frac{{\mu}_{1}}{{\sigma}^{1}} &\ \ \ \ \ \ &{\gamma}_{2} &= \frac{{\kappa}_{4}}{\sqrt{{{\kappa}_{2}}^{4}}} = \frac{{\mu}_{4}}{{\sigma}^{4}} - 3 \\ 1 &= \frac{{\kappa}_{2}}{\sqrt{{{\kappa}_{2}}^{2}}} = \frac{{\mu}_{2}}{{\sigma}^{2}} &\ \ \ \ \ \ &{\gamma}_{3} &= \frac{{\kappa}_{5}}{\sqrt{{{\kappa}_{2}}^{5}}} = \frac{{\mu}_{5}}{{\sigma}^{5}} - 10{\gamma}_{1} \\ {\gamma}_{1} &= \frac{{\kappa}_{3}}{\sqrt{{{\kappa}_{2}}^{3}}} = \frac{{\mu}_{3}}{{\sigma}^{3}} &\ \ \ \ \ \ &{\gamma}_{4} &= \frac{{\kappa}_{6}}{\sqrt{{{\kappa}_{2}}^{6}}} = \frac{{\mu}_{6}}{{\sigma}^{6}} - 15{\gamma}_{2} - 10{{\gamma}_{1}}^{2} - 15. \end{split} \tag{1.3} \end{align}\]The values \({\gamma}_{1}\) and \({\gamma}_{2}\) correspond to skewness and standardized kurtosis (so that the normal distribution has a value of 0, hereafter referred to as skurtosis). The corresponding sample values for the above can be obtained by replacing \({\mu}_{r}\) by \({m}_{r} = \sum_{j=1}^{n}{({x}_{j}-{m}_{1})}^{r}/n\) (Headrick 2002).
The standardized cumulants for a continuous mixture variable can be derived in terms of the standardized cumulants of its component distributions. Suppose the goal is to simulate a continuous mixture variable \(Y\) with PDF \(h_{Y}(y)\) that contains two component distributions \(Y_a\) and \(Y_b\) with mixing parameters \(\pi_a\) and \(\pi_b\):
\[\begin{equation} h_{Y}(y) = \pi_a f_{Y_a}(y) + \pi_b g_{Y_b}(y),\ y\ \in\ \bm{Y},\ \pi_a\ \in\ (0,\ 1),\ \pi_b\ \in\ (0,\ 1),\ \pi_a + \pi_b = 1. \tag{1.4} \end{equation}\] Here,so that \(Y_a\) and \(Y_b\) have expected values \(\mu_a\) and \(\mu_b\) and variances \(\sigma_a^2\) and \(\sigma_b^2\). Assume the variables \(Z_a'\) and \(Z_b'\) are generated with zero mean and unit variance using Headrick’s fifth-order PMT given the specified values for skew (\(\gamma_{1_a}'\), \(\gamma_{1_b}'\)), skurtosis (\(\gamma_{2_a}'\), \(\gamma_{2_b}'\)), and standardized fifth (\(\gamma_{3_a}'\), \(\gamma_{3_b}'\)) and sixth (\(\gamma_{4_a}'\), \(\gamma_{4_b}'\)) cumulants:
\[\begin{equation} \begin{split} Z_a' &= c_{0_a} + c_{1_a}Z_a + c_{2_a}Z_a^2 + c_{3_a}Z_a^3 + c_{4_a}Z_a^4 + c_{5_a}Z_a^5,\ Z_a \sim N(0,\ 1) \\ Z_b' &= c_{0_b} + c_{1_b}Z_b + c_{2_b}Z_b^2 + c_{3_b}Z_b^3 + c_{4_b}Z_b^4 + c_{5_b}Z_b^5,\ Z_b \sim N(0,\ 1). \end{split} \tag{1.6} \end{equation}\]The constants \(c_{0_a},\ ...,\ c_{5_a}\) and \(c_{0_b},\ ...,\ c_{5_b}\) are the solutions to the system of equations given in SimMultiCorrData::poly
and calculated by SimMultiCorrData::find_constants
. Similar results hold for Fleishman’s third-order PMT, where the constants \(c_{0_a},\ ...,\ c_{3_a}\) and \(c_{0_b},\ ...,\ c_{3_b}\) are the solutions to the system of equations given in SimMultiCorrData::fleish
and \(c_{4_a} = c_{5_a} = c_{4_b} = c_{5_b} = 0\) (Fialkowski 2018).
The \(r^{th}\) expected value of \(Y\) can be expressed as:
\[\begin{equation} \begin{split} E_{h}[Y^r] &= \int y^r h_{Y}(y) dy = \pi_a \int y^r f_{Y_a}(y) dy + \pi_b \int y^r g_{Y_b}(y) dy \\ &= \pi_a E_{f}[Y_a^r] + \pi_b E_{g}[Y_b^r]. \end{split} \tag{1.7} \end{equation}\]This expression can be used to derive expressions for the mean, variance, skew, skurtosis, and standardized fifth and sixth cumulants of \(Y\) in terms of the \(r^{th}\) expected values of \(Y_a\) and \(Y_b\).
Since \(E_{f}[Z_a'] = E_{g}[Z_b'] = 0\), this becomes \(E_{h}[Y] = \pi_a \mu_a + \pi_b \mu_b\).
Applying the variance relation to \(Z_a'\) and \(Z_b'\) gives:
\[\begin{equation} \begin{split} E_{f}[{Z_a'}^2] &= Var_{f}[Z_a'] + {(E_{f}[Z_a'])}^2 \\ E_{g}[{Z_b'}^2] &= Var_{g}[Z_b'] + {(E_{g}[Z_b'])}^2. \end{split} \tag{1.10} \end{equation}\] Since \(E_{f}[Z_a'] = E_{g}[Z_b'] = 0\) and \(Var_{f}[Z_a'] = Var_{g}[Z_b'] = 1\), \(E_{f}[{Z_a'}^2]\) and \(E_{g}[{Z_b'}^2]\) both equal \(1\).Then \(E_{f}[{Z_a'}^3] = {\mu}_{3_a}'\) and \(E_{g}[{Z_b'}^3] = {\mu}_{3_b}'\) are given by:
\[\begin{equation} \begin{split} E_{f}[{Z_a'}^3] &= {(Var_{f}[Z_a'])}^{3/2} \gamma_{1_a}' = \gamma_{1_a}' \\ E_{g}[{Z_b'}^3] &= {(Var_{g}[Z_b'])}^{3/2} \gamma_{1_b}' = \gamma_{1_b}'. \end{split} \tag{1.14} \end{equation}\] Combining these with \(E_{f}[Z_a'] = E_{g}[Z_b'] = 0\) and \(E_{f}[{Z_a'}^2] = E_{g}[{Z_b'}^2] = 1\), \(E_{h}[Y^3]\) simplifies to:Then \(E_{f}[{Z_{a}'}^4] = {\mu}_{4_{a}}'\) and \(E_{g}[{Z_{b}'}^4] = {\mu}_{4_{b}}'\) are given by:
\[\begin{equation} \begin{split} E_{f}[{Z_{a}'}^4] &= {(Var_{f}[Z_{a}'])}^2 (\gamma_{2_{a}}' + 3) = \gamma_{2_{a}}' + 3 \\ E_{g}[{Z_{b}'}^4] &= {(Var_{g}[Z_{b}'])}^2 (\gamma_{2_{b}}' + 3) = \gamma_{2_{b}}' + 3. \end{split} \tag{1.18} \end{equation}\]Since \(E_{f}[Z_{a}'] = E_{g}[Z_{b}'] = 0\) and \(E_{f}[{Z_{a}'}^2] = E_{g}[{Z_{b}'}^2] = 1\), \(E_{h}[Y^4]\) simplifies to:
\[\begin{equation} \begin{split} E_{h}[Y^4] &= \pi_a (\sigma_{a}^4(\gamma_{2_{a}}' + 3) + 4\sigma_{a}^3\mu_{a}\gamma_{1_{a}}' + 6\sigma_{a}^2\mu_{a}^2 + \mu_{a}^4) \\ &\ \ \ \ + \pi_b (\sigma_{b}^4(\gamma_{2_{b}}' + 3) + 4\sigma_{b}^3\mu_{b}\gamma_{1_{b}}' + 6\sigma_{b}^2\mu_{b}^2 + \mu_{b}^4). \end{split} \tag{1.19} \end{equation}\]Therefore, the skurtosis of \(Y\) is:
\[\begin{equation} \begin{split} \gamma_{2} &= \frac{\pi_a (\sigma_{a}^4(\gamma_{2_{a}}' + 3) + 4\sigma_{a}^3\mu_{a}\gamma_{1_{a}}' + 6\sigma_{a}^2\mu_{a}^2 + \mu_{a}^4)}{{(\pi_a (\sigma_a^2 + \mu_a^2) + \pi_b (\sigma_b^2 + \mu_b^2) - [\pi_a \mu_a + \pi_b \mu_b]^2)}^2} \\ \\ &\ \ \ \ + \frac{\pi_b (\sigma_{b}^4(\gamma_{2_{b}}' + 3) + 4\sigma_{b}^3\mu_{b}\gamma_{1_{b}}' + 6\sigma_{b}^2\mu_{b}^2 + \mu_{b}^4)}{{(\pi_a (\sigma_a^2 + \mu_a^2) + \pi_b (\sigma_b^2 + \mu_b^2) - [\pi_a \mu_a + \pi_b \mu_b]^2)}^2}. \end{split} \tag{1.20} \end{equation}\]Then \(E_{f}[{Z_{a}'}^5] = {\mu}_{5_{a}}'\) and \(E_{g}[{Z_{b}'}^5] = {\mu}_{5_{b}}'\) are given by:
\[\begin{equation} \begin{split} E_{f}[{Z_{a}'}^5] &= {(Var_{f}[Z_{a}'])}^{5/2} (\gamma_{3_{a}}' + 10\gamma_{1_{a}}') = \gamma_{3_{a}}' + 10\gamma_{1_{a}}' \\ E_{g}[{Z_{b}'}^5] &= {(Var_{g}[Z_{b}'])}^{5/2} (\gamma_{3_{b}}' + 10\gamma_{1_{b}}') = \gamma_{3_{b}}' + 10\gamma_{1_{b}}'. \end{split} \tag{1.22} \end{equation}\]Since \(E_{f}[Z_{a}'] = E_{g}[Z_{b}'] = 0\) and \(E_{f}[{Z_{a}'}^2] =\) \(E_{g}[{Z_{b}'}^2] = 1\), \(E_{h}[Y^5]\) simplifies to:
\[\begin{equation} \begin{split} E_{h}[Y^5] &= \pi_a (\sigma_{a}^5(\gamma_{3_{a}}' + 10\gamma_{1_{a}}') + 5\sigma_{a}^4\mu_{a}(\gamma_{2_{a}}' + 3) + 10\sigma_{a}^3\mu_{a}^2\gamma_{1_{a}}' + 10\sigma_{a}^2\mu_{a}^3 + \mu_{a}^5) \\ &\ \ \ \ + \pi_b (\sigma_{b}^5(\gamma_{3_{b}}' + 10\gamma_{1_{b}}') + 5\sigma_{b}^4\mu_{b}(\gamma_{2_{b}}' + 3) + 10\sigma_{b}^3\mu_{b}^2\gamma_{1_{b}}' + 10\sigma_{b}^2\mu_{b}^3 + \mu_{b}^5). \end{split} \tag{1.23} \end{equation}\]Therefore, the standardized fifth cumulant of \(Y\) is:
\[\begin{equation} \begin{split} \gamma_{3} &= \frac{\pi_a (\sigma_{a}^5(\gamma_{3_{a}}' + 10\gamma_{1_{a}}') + 5\sigma_{a}^4\mu_{a}(\gamma_{2_{a}}' + 3) + 10\sigma_{a}^3\mu_{a}^2\gamma_{1_{a}}' + 10\sigma_{a}^2\mu_{a}^3 + \mu_{a}^5)}{{(\pi_a (\sigma_a^2 + \mu_a^2) + \pi_b (\sigma_b^2 + \mu_b^2) - [\pi_a \mu_a + \pi_b \mu_b]^2)}^{5/2}} \\ \\ &\ \ \ \ + \frac{\pi_b (\sigma_{b}^5(\gamma_{3_{b}}' + 10\gamma_{1_{b}}') + 5\sigma_{b}^4\mu_{b}(\gamma_{2_{b}}' + 3) + 10\sigma_{b}^3\mu_{b}^2\gamma_{1_{b}}' + 10\sigma_{b}^2\mu_{b}^3 + \mu_{b}^5)}{{(\pi_a (\sigma_a^2 + \mu_a^2) + \pi_b (\sigma_b^2 + \mu_b^2) - [\pi_a \mu_a + \pi_b \mu_b]^2)}^{5/2}} - 10{\gamma}_{1}. \end{split} \tag{1.24} \end{equation}\]Then \(E_{f}[{Z_{a}'}^6] = {\mu}_{6_{a}}'\) and \(E_{g}[{Z_{b}'}^6] = {\mu}_{6_{b}}'\) are given by:
\[\begin{equation} \begin{split} E_{f}[{Z_{a}'}^6] &= {(Var_{f}[Z_{a}'])}^3 (\gamma_{4_{a}}' + 15\gamma_{2_{a}}' + 10{\gamma_{1_{a}}'}^2 + 15) = \gamma_{4_{a}}' + 15\gamma_{2_{a}}' + 10{\gamma_{1_{a}}'}^2 + 15 \\ E_{g}[{Z_{b}'}^6] &= {(Var_{g}[Z_{b}'])}^3 (\gamma_{4_{b}}' + 15\gamma_{2_{b}}' + 10{\gamma_{1_{b}}'}^2 + 15) = \gamma_{4_{b}}' + 15\gamma_{2_{b}}' + 10{\gamma_{1_{b}}'}^2 + 15. \end{split} \tag{1.26} \end{equation}\]Since \(E_{f}[Z_{a}'] = E_{g}[Z_{b}'] = 0\) and \(E_{f}[{Z_{a}'}^2] = E_{g}[{Z_{b}'}^2] = 1\), \(E_{h}[Y^6]\) simplifies to:
\[\begin{equation} \begin{split} E_{h}[Y^6] &= \pi_a (\sigma_{a}^6(\gamma_{4_{a}}' + 15\gamma_{2_{a}}' + 10{\gamma_{1_{a}}'}^2 + 15) + 6\sigma_{a}^5\mu_{a}(\gamma_{3_{a}}' + 10\gamma_{1_{a}}') + 15\sigma_{a}^4\mu_{a}^2(\gamma_{2_{a}}' + 3) + 20\sigma_{a}^3\mu_{a}^3\gamma_{1_{a}}' \\ &\ \ \ \ + 15\sigma_{a}^2\mu_{a}^4 + \mu_{a}^6) \\ &\ \ \ \ + \pi_b (\sigma_{b}^6(\gamma_{4_{b}}' + 15\gamma_{2_{b}}' + 10{\gamma_{1_{b}}'}^2 + 15) + 6\sigma_{b}^5\mu_{b}(\gamma_{3_{b}}' + 10\gamma_{1_{b}}') + 15\sigma_{b}^4\mu_{b}^2(\gamma_{2_{b}}' + 3) + 20\sigma_{b}^3\mu_{b}^3\gamma_{1_{b}}' \\ &\ \ \ \ + 15\sigma_{b}^2\mu_{b}^4 + \mu_{b}^6). \end{split} \tag{1.27} \end{equation}\]Therefore, the standardized sixth cumulant of \(Y\) is:
\[\begin{equation} \begin{split} \gamma_{4} &= \frac{\pi_a (\sigma_{a}^6(\gamma_{4_{a}}' + 15\gamma_{2_{a}}' + 10{\gamma_{1_{a}}'}^2 + 15) + 6\sigma_{a}^5\mu_{a}(\gamma_{3_{a}}' + 10\gamma_{1_{a}}') + 15\sigma_{a}^4\mu_{a}^2(\gamma_{2_{a}}' + 3) + 20\sigma_{a}^3\mu_{a}^3\gamma_{1_{a}}' + 15\sigma_{a}^2\mu_{a}^4 + \mu_{a}^6)}{{(\pi_a (\sigma_a^2 + \mu_a^2) + \pi_b (\sigma_b^2 + \mu_b^2) - [\pi_a \mu_a + \pi_b \mu_b]^2)}^3} \\ \\ &\ \ \ \ + \frac{\pi_b (\sigma_{b}^6(\gamma_{4_{b}}' + 15\gamma_{2_{b}}' + 10{\gamma_{1_{b}}'}^2 + 15) + 6\sigma_{b}^5\mu_{b}(\gamma_{3_{b}}' + 10\gamma_{1_{b}}') + 15\sigma_{b}^4\mu_{b}^2(\gamma_{2_{b}}' + 3) + 20\sigma_{b}^3\mu_{b}^3\gamma_{1_{b}}' + 15\sigma_{b}^2\mu_{b}^4 + \mu_{b}^6)}{{(\pi_a (\sigma_a^2 + \mu_a^2) + \pi_b (\sigma_b^2 + \mu_b^2) - [\pi_a \mu_a + \pi_b \mu_b]^2)}^3} \\ &\ \ \ \ - 15{\gamma}_{2} - 10{{\gamma}_{1}}^{2} - 15. \end{split} \tag{1.28} \end{equation}\]Therefore, a method similar to that above can be used to derive the system of equations defining the mean, variance, skew, skurtosis, and standardized fifth and sixth cumulants of \(Y\). These equations are used within the function calc_mixmoments
to determine the values for a mixture variable. Some code has been modified from the SimMultiCorrData package (Fialkowski 2018).
Even though the correlations for the continuous mixture variables are set at the component level, we can approximate the resulting correlations for the mixture variables. The example from the Overall Workflow for Generation of Correlated Data vignette is used for demonstration.
Assume \(M1\) and \(M2\) are two continuous mixture variables. Let \(M1\) have \(k_{M1}\) components with mixing probabilities \(\alpha_1, ..., \alpha_{k_{M1}}\). The standard deviations of the components are \(\sigma_{M1_1}, \sigma_{M1_2}, ..., \sigma_{M1_{k_{M1}}}\). Let \(M2\) have \(k_{M2}\) components with mixing probabilities \(\beta_1, ..., \beta_{k_{M2}}\). The standard deviations of the components are \(\sigma_{M2_1}, \sigma_{M2_2}, ..., \sigma_{M2_{k_{M2}}}\).
Equation (2.1) requires the expected value of the product of \(M1\) and \(M2\). Since \(M1\) and \(M2\) may contain any desired number of components and these components may have any continuous distribution, there is no general way to determine this expected value. Therefore, it will be approximated by expressing \(M1\) and \(M2\) as sums of their component variables:
\[\begin{equation} Cor(M1, M2) = \frac{E[(\sum_{i = 1}^{k_{M1}} \alpha_i M1_i) (\sum_{j = 1}^{k_{M2}} \beta_j M2_j)] - E[\sum_{i = 1}^{k_{M1}} \alpha_i M1_i]E[\sum_{j = 1}^{k_{M2}} \beta_j M2_j]}{\sigma_{M1} \sigma_{M2}}, \tag{2.2} \end{equation}\] whereso that we can rewrite \(Cor(M1, M2)\) as:
\[\begin{equation} \begin{split} Cor(M1, M2) &= \frac{\alpha_1 \beta_1 (\sigma_{M1_1} \sigma_{M2_1} Cor(M1_1, M2_1) + E[M1_1] E[M2_1])}{\sigma_{M1} \sigma_{M2}} \\ &\ \ \ \ + ... + \frac{\alpha_{k_{M1}} \beta_{k_{M2}} (\sigma_{M1_{k_{M1}}} \sigma_{M2_{k_{M2}}} Cor(M1_{k_{M1}}, M2_{k_{M2}}) + E[M1_{k_{M1}}] E[M2_{k_{M2}}])}{\sigma_{M1} \sigma_{M2}} \\ &- \frac{\alpha_1 \beta_1 E[M1_1] E[M2_1] + ... + \alpha_{k_{M1}} \beta_{k_{M2}} E[M1_{k_{M1}}] E[M2_{k_{M2}}]}{\sigma_{M1} \sigma_{M2}} \\ &= \frac{\sum_{i = 1}^{k_{M1}} \alpha_i \sigma_{M1_i} \sum_{j = 1}^{k_{M2}} \beta_j \sigma_{M2_j} Cor(M1_i, M2_j)}{\sigma_{M1} \sigma_{M2}}. \end{split} \tag{2.5} \end{equation}\]For this example:
\[\begin{equation} \begin{split} Cor(M1, M2) = &\frac{\alpha_1 \sigma_{M1_1} [\beta_1 \sigma_{M2_1} Cor(M1_1, M2_1) + \beta_2 \sigma_{M2_2} Cor(M1_1, M2_2) + \beta_3 \sigma_{M2_3} Cor(M1_1, M2_3)]}{\sigma_{M1} \sigma_{M2}} \\ &+ \frac{\alpha_2 \sigma_{M1_2} [\beta_1 \sigma_{M2_1} Cor(M1_2, M2_1) + \beta_2 \sigma_{M2_2} Cor(M1_2, M2_2) + \beta_3 \sigma_{M2_3} Cor(M1_2, M2_3)]}{\sigma_{M1} \sigma_{M2}} \\ & \\ = &\frac{0.4 * 1 * [0.3 * \sqrt{\pi^2/3} * 0.35 + 0.2 * \sqrt{8} * 0.35 + 0.5 * \sqrt{6/196.625} * 0.35]}{2.2 * 2.170860} \\ &+ \frac{0.6 * 1 * [0.3 * \sqrt{\pi^2/3} * 0.35 + 0.2 * \sqrt{8} * 0.35 + 0.5 * \sqrt{6/196.625} * 0.35]}{2.2 * 2.170860} \end{split} \tag{2.6} \end{equation}\]library("SimCorrMix")
L <- calc_theory("Logistic", c(0, 1))
C <- calc_theory("Chisq", 4)
B <- calc_theory("Beta", c(4, 1.5))
mix_pis <- list(c(0.4, 0.6), c(0.3, 0.2, 0.5))
mix_mus <- list(c(-2, 2), c(L[1], C[1], B[1]))
mix_sigmas <- list(c(1, 1), c(L[2], C[2], B[2]))
p_M11M21 <- p_M11M22 <- p_M11M23 <- 0.35
p_M12M21 <- p_M12M22 <- p_M12M23 <- 0.35
p_M1M2 <- matrix(c(p_M11M21, p_M11M22, p_M11M23, p_M12M21, p_M12M22, p_M12M23),
2, 3, byrow = TRUE)
rhoM1M2 <- rho_M1M2(mix_pis, mix_mus, mix_sigmas, p_M1M2)
The correlation between \(M1\) and \(M2\) is approximated as 0.0877342.
Equation (2.7) requires the expected value of the product of \(M1\) and \(Y\). Since \(M1\) may contain any desired number of components and these components may have any continuous distribution, there is no general way to determine this expected value. Therefore, it will again be approximated by expressing \(M1\) as a sum of its component variables:
\[\begin{equation} Cor(M1, Y) = \frac{E[(\sum_{i = 1}^{k_{M1}} \alpha_i M1_i) Y] - E[\sum_{i = 1}^{k_{M1}} \alpha_i M1_i]E[Y]}{\sigma_{M1} \sigma_Y}, \tag{2.8} \end{equation}\] whereso that we can rewrite \(Cor(M1, Y)\) as:
\[\begin{equation} \begin{split} Cor(M1, Y) &= \frac{\alpha_1 (\sigma_{M1_1} \sigma_Y Cor(M1_1, Y) + E[M1_1] E[Y])}{\sigma_{M1} \sigma_Y} \\ &\ \ \ \ \ + ... + \frac{\alpha_{k_{M1}} (\sigma_{M1_{k_{M1}}} \sigma_Y Cor(M1_{k_{M1}}, Y) + E[M1_{k_{M1}}] E[Y])}{\sigma_{M1} Y} \\ &\ \ \ \ \ - \frac{\alpha_1 E[M1_1] E[Y] + ... + \alpha_{k_{M1}} E[M1_{k_{M1}}] E[Y]}{\sigma_{M1} \sigma_Y} \\ &= \frac{\sum_{i = 1}^{k_{M1}} \alpha_i \sigma_{M1_i} Cor(M1_i, Y)}{\sigma_{M1}}. \end{split} \tag{2.11} \end{equation}\] Similarly,For this example, \(Y\) can be \(O1\), \(C1\), \(C2\), \(P1\), or \(NB1\). Let \(Y = C1\). Then we have:
\[\begin{equation} \begin{split} Cor(M1, C1) = &\frac{\alpha_1 \sigma_{M1_1} Cor(M1_1, C1) + \alpha_2 \sigma_{M1_2} Cor(M1_2, C1)}{\sigma_{M1}} \\ = &\frac{0.4 * 1 * 0.35 + 0.6 * 1 * 0.35}{2.2} \end{split} \tag{2.13} \end{equation}\]p_M11C1 <- p_M12C1 <- 0.35
p_M1C1 <- c(p_M11C1, p_M12C1)
rho_M1C1 <- rho_M1Y(mix_pis[[1]], mix_mus[[1]], mix_sigmas[[1]], p_M1C1)
The correlation between \(M1\) and \(C1\) is approximated as 0.1590909. Since \(O1\), \(C2\), \(P1\), and \(NB1\) have the same target pairwise correlations with \(M1_1\) and \(M1_2\) as \(C1\), their correlations with \(M1\) are also approximated as 0.1590909.
Similarly,p_M21C1 <- p_M22C1 <- p_M23C1 <- 0.35
p_M2C1 <- c(p_M21C1, p_M22C1, p_M23C1)
rho_M2C1 <- rho_M1Y(mix_pis[[2]], mix_mus[[2]], mix_sigmas[[2]], p_M2C1)
The correlation between \(M2\) and \(C1\) is approximated as 0.1930151. Since \(O1\), \(C2\), \(P1\), and \(NB1\) have the same target pairwise correlations with \(M2_1\), \(M2_2\), and \(M2_3\) as \(C1\), their correlations with \(M2\) are also approximated as 0.1930151.
Bates, Douglas, and Martin Maechler. 2018. Matrix: Sparse and Dense Matrix Classes and Methods. https://CRAN.R-project.org/package=Matrix.
Davenport, JW, JC Bezder, and RJ Hathaway. 1988. “Parameter Estimation for Finite Mixture Distributions.” Computers & Mathematics with Applications 15 (10): 819–28.
Everitt, B S. 1996. “An Introduction to Finite Mixture Distributions.” Statistical Methods in Medical Research 5 (2): 107–27. https://doi.org/10.1177/096228029600500202.
Fialkowski, A C. 2018. SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. https://CRAN.R-project.org/package=SimMultiCorrData.
Fleishman, A I. 1978. “A Method for Simulating Non-Normal Distributions.” Psychometrika 43: 521–32. https://doi.org/10.1007/BF02293811.
Headrick, T C. 2002. “Fast Fifth-Order Polynomial Transforms for Generating Univariate and Multivariate Non-Normal Distributions.” Computational Statistics and Data Analysis 40 (4): 685–711. https://doi.org/10.1016/S0167-9473(02)00072-5.
Higham, N. 2002. “Computing the Nearest Correlation Matrix - a Problem from Finance.” IMA Journal of Numerical Analysis 22 (3): 329–43. https://doi.org/10.1093/imanum/22.3.329.
Kendall, M, and A Stuart. 1977. The Advanced Theory of Statistics. 4th ed. New York: Macmillan.
Pearson, RK. 2011. “Exploring Data in Engineering, the Sciences, and Medicine.” In. New York: Oxford University Press.
Schork, N J, D B Allison, and B Thiel. 1996. “Mixture Distributions in Human Genetics Research.” Statistical Methods in Medical Research 5: 155–78. https://doi.org/10.1177/096228029600500204.