Recall that a random variable \(X\) has (univariate) negative binomial law with parameters \(\kappa>0, 0<p<1\), i.e., \(X\sim \text{NB}(\kappa, p)\) if its probability mass function is given by \[ P(X=x) = {\kappa+x-1 \choose x} p^x(1-p)^{\kappa}, \quad x \in \{0,1,\ldots\}. \]
A random vector \({\bf X}=(X_1,\dots,X_n)'\) is said to follow the multivariate negative binomial distribution with parameters \(\kappa, p_1, \dots, p_n\) if its probability mass function is given by \[ P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n+\kappa)}{x_1! \cdots x_n! \Gamma(\kappa)}p_1^{x_1}\cdots p_n^{x_n}(1-p_1-\cdots-p_n)^{\kappa}, \] where, for \(i=1,\dots,n\), \(x_i\in\{0,1,\dots\}\), \(0<p_i<1\) such that \(\sum_{i=1}^np_i<1\) and \(\kappa>0\).
We note that the function stats::rnbinom
can be used to simulate from the univariate negative binomial distribution. The trawl
package introduces the function Bivariate_NBsim
which generates samples from the bivariate negative binomial distribution. The simulation algorithm proceeds in two steps: First, we simulate \(X_1\) from the univariate negative binomial distribution NB(\(\kappa\),\(p_1/(1-p_2)\)). Second, we simulate \(X_2|X_1=x_1\) from the univariate negative binomial distribution NB(\(\kappa+x_1,p_2\)), see for instance (Dunn 1967).
###Example
set.seed(1)
<- 3
kappa<- 0.1
p1 <- 0.85
p2 <- p1+p2
p <-1-p1-p2
p0
<- 10000
N
#Simulate from the bivariate negative binomial distribution
<- trawl::Bivariate_NBsim(N,kappa,p1,p2)
y
#Compare the empirical and theoretical mean of the first component
::mean(y[,1])
base#> [1] 5.9653
*p1/(1-p)
kappa#> [1] 6
#Compare the empirical and theoretical variance of the first component
::var(y[,1])
stats#> [1] 18.0047
*p1*(1-p2)/(1-p)^2
kappa#> [1] 18
#Compare the empirical and theoretical mean of the second component
::mean(y[,2])
base#> [1] 50.9742
*p2/(1-p)
kappa#> [1] 51
#Compare the empirical and theoretical variance of the second component
::var(y[,2])
stats#> [1] 926.3358
*p2*(1-p1)/(1-p)^2
kappa#> [1] 918
#Compare the empirical and theoretical correlation between the two components
::cor(y[,1],y[,2])
stats#> [1] 0.7891007
*p2/(p0+p1)*(p0+p2))^(1/2)
(p1#> [1] 0.7141428
We say that a vector \({\bf X}=(X_1,\dots,X_n)'\) follows the multivariate logarithmic series distribution (LSD), see, e.g., (Patil and Bildikar 1967). \({\bf X} \sim \text{LSD}(p_1,\ldots,p_n)\), where \(0<p_i<1, p:=\sum_{i=1}^np_i <1\) if for \({\bf x}\in \mathbb{N}_0^n \setminus \{{\bf 0} \}\), if its probability mass function is given by \[
P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n)}{x_1! \cdots x_n!}\frac{p_1^{x_1}\cdots p_n^{x_n}}{\{-\ln(1-p)\}}.
\] Note that each component \(X_i\) follows the modified univariate logarithmic distribution with parameters \(\tilde p_i = p_i/(1-p+p_i)\) and \(\delta_i= \ln(1-p+p_i)/\ln(1-p)\), i.e., \(X_i\sim\text{ModLSD}(\tilde p_i, \delta_i)\) with \[
P(X_i =x_i) = \left\{ \begin{array}{ll} \delta_i, & \text{for } x_i=0\\
(1-\delta_i) \frac{1}{x_i}\frac{\tilde p_i^{x_i}}{\{-\ln(1-\tilde p_i)\}}, & \text{for } x_i \in \mathbb{N}. \end{array} \right.
\] Simulations from the univariate LSD can be carried out using the function Runuran::urlogarithmic
. The trawl
package implements the functions Bivariate_LSDsim
and Trivariate_LSDsim
to simulate from both the bivariate and the trivariate logarithmic series distribution.
The function Bivariate_NBsim
can be used to simulate from the bivariate logarithmic series distribution. To this end, note that the probability mass function of a random vector \({\bf X}=(X_1,X_2)'\) following the bivariate logarithmic series distribution with parameters \(0<p_1, p_2<1\) with \(p:=p_1+p_2<1\) is given by \[P(X_1=x_1,X_2=x_2)=\frac{\Gamma(x_1+x_2)}{x_1!x_2!}
\frac{p_1^{x_1}p_2^{x_2}}{\{-\ln(1-p)\}},\] for \(x_1,x_2=0,1,2,\dots\) such that \(x_1+x_2>0\).
The simulation proceeds in two steps: First, \(X_1\) is simulated from the modified logarithmic distribution with parameters \(\tilde p_1=p_1/(1-p_2)\) and \(\delta_1=\ln(1-p_2)/\ln(1-p)\). Then we simulate \(X_2\) conditional on \(X_1\). We note that \(X_2|X_1=x_1\) follows the logarithmic series distribution with parameter \(p_2\) when \(x_1=0\), and the negative binomial distribution with parameters \((x_1,p_2)\) when \(x_1>0\).
Next we provide an example of a simulation from the bivariate LSD and we showcase the functions ModLSD_Mean
, ModLSD_Var
, BivLSD_Cor
and BivLSD_Cov
which compute the mean and the variance of the univariate modified LSD and the correlation and covariance of the bivariate LSD, respectively.
set.seed(1)
<-0.15
p1<-0.3
p2
<-10000
N
#Simulate N realisations from the bivariate LSD
<-trawl::Bivariate_LSDsim(N, p1, p2)
y
#Compute the empirical and theoretical mean of the first component
::mean(y[,1])
base#> [1] 0.4575
::ModLSD_Mean(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
trawl#> [1] 0.45619
#Compute the empirical and theoretical mean of the second component
::mean(y[,2])
base#> [1] 0.8995
::ModLSD_Mean(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
trawl#> [1] 0.91238
#Compute the empirical and theoretical variance of the first component
::var(y[,1])
stats#> [1] 0.3686306
::ModLSD_Var(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
trawl#> [1] 0.3724961
#Compute the empirical and theoretical variance of the second component
::var(y[,2])
stats#> [1] 0.5690567
::ModLSD_Var(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
trawl#> [1] 0.5776045
##Compute the empirical and theoretical correlation between the two components
::cor(y[,1],y[,2])
stats#> [1] -0.3961486
::BivLSD_Cor(p1,p2)
trawl#> [1] -0.3608673
##Compute the empirical and theoretical covariance between the two components
::cov(y[,1],y[,2])
stats#> [1] -0.1814394
::BivLSD_Cov(p1,p2)
trawl#> [1] -0.1673877
The function Trivariate_NBsim
can be used to simulate from the trivariate logarithmic series distribution. The simulation proceeds in two steps: First, \(X_1\) is simulated from the modified logarithmic distribution with parameters \(\tilde p_1=p_1/(1-p_2-p_3)\) and \(\delta_1=\ln(1-p_2-p_3)/\ln(1-p)\). Then we simulate \((X_2,X_3)'\) conditional on \(X_1\). We note that \((X_2,X_3)'|X_1=x_1\) follows the bivariate logarithmic series distribution with paramaters \((p_2,p_3)\) when \(x_1=0\), and the bivariate negative binomial distribution with parameters \((x_1,p_2,p_3)\) when \(x_1>0\).
set.seed(1)
<-0.15
p1<-0.25
p2<-0.55
p3
<- 10000
N
#Simulate N realisations from the bivariate LSD
<-trawl::Trivariate_LSDsim(N, p1, p2, p3)
y
#Compute the empirical and theoretical mean of the first component
::mean(y[,1])
base#> [1] 1.0152
::ModLSD_Mean(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
trawl#> [1] 1.001425
#Compute the empirical and theoretical mean of the second component
::mean(y[,2])
base#> [1] 1.6857
::ModLSD_Mean(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
trawl#> [1] 1.669041
#Compute the empirical and theoretical mean of the third component
::mean(y[,3])
base#> [1] 3.7275
::ModLSD_Mean(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
trawl#> [1] 3.67189
#Compute the empirical and theoretical variance of the first component
::var(y[,1])
stats#> [1] 2.999069
::ModLSD_Var(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
trawl#> [1] 3.002847
#Compute the empirical and theoretical variance of the second component
::var(y[,2])
stats#> [1] 7.255041
::ModLSD_Var(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
trawl#> [1] 7.228548
#Compute the empirical and theoretical variance of the third component
::var(y[,3])
stats#> [1] 30.87613
::ModLSD_Var(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
trawl#> [1] 30.5799
#Computing the bivariate covariances and correlations
#Cor(X1,X2):
<- base::log(1-p3)/base::log(1-p1-p2-p3)
delta <-p1/(1-p3)
hatp1 <-p2/(1-p3)
hatp2
::cov(y[,1],y[,2])
stats#> [1] 3.316309
::BivModLSD_Cov(delta,hatp1,hatp2)
trawl#> [1] 3.335704
::cor(y[,1],y[,2])
stats#> [1] 0.7109545
::BivModLSD_Cor(delta,hatp1,hatp2)
trawl#> [1] 0.6041258
#Cor(X1,X3):
<- log(1-p2)/log(1-p1-p2-p3)
delta <-p1/(1-p2)
hatp1 <-p3/(1-p2)
hatp2
::cov(y[,1],y[,3])
stats#> [1] 7.287671
::BivModLSD_Cov(delta,hatp1,hatp2)
trawl#> [1] 7.338549
::cor(y[,1],y[,3])
stats#> [1] 0.7573281
::BivModLSD_Cor(delta,hatp1,hatp2)
trawl#> [1] 0.7220044
#Cor(X2,X3):
<- log(1-p1)/log(1-p1-p2-p3)
delta <-p2/(1-p1)
hatp1 <-p3/(1-p1)
hatp2
::cov(y[,2],y[,3])
stats#> [1] 12.31899
::BivModLSD_Cov(delta,hatp1,hatp2)
trawl#> [1] 12.23092
::cor(y[,2],y[,3])
stats#> [1] 0.8230829
::BivModLSD_Cor(delta,hatp1,hatp2)
trawl#> [1] 0.7969081
##References