Note that Bsim is very small, so that this vignette does not take to long to run. Therefore none of the p values or powers below are correct.
The package Rgof brings together a number of routines for the goodness-of-fit problem for univariate data. We have a data set \(\pmb{x}\), and we want to test whether it was generated by the probability distribution F.
The highlights of this package are:
Note all runs of the test routine are done with B=1000 and all runs of the power routines with arguments B=500 in order to pass devtools::check().
For all of these tests the distribution of the test statistic under the null hypothesis is found via simulation.
There is a very large literature on chi square tests, the oldest of the goodness of fit tests. For a survey see (Rolke and Gutierrez-Gongora 2020).
All the methods above are also implemented for discrete data, except for Zhang’s tests, which have no discrete analog.
It is worth noting that these discrete versions are based on the theoretical ideas of the tests and not on the actual formula of calculation for the continuous case. The test statistics can therefore be different even when applied to the same data. For example, the Anderson-Darling test is based on the distance measure
\[A^2=n\int_{-\infty}^{\infty} \frac{(\hat{F}(x)-F(x))^2}{F(x)(1-F(x))}dF(x) \] where \(F\) is the theoretical distribution function under the null hypothesis and \(\hat{F}\) is the empirical distribution function. In the case of continuous data it can be shown that
\[A^2=-n-\frac1n\sum_{i=1}^n (2i-1)\left(\log F(x_i) +\log[1-F(x_{n+1-i})\right)\] However, for discrete data we have
\[A^2=n\sum_{i=1}^k \frac{(\hat{F}(x_i)-F(x_i))^2}{F(x_i)(1-F(x_i))}\left(F(x_i)-F(x_{i-1}\right)\]
with \(F(x_0)=0\).
In the continuous case \(\hat{F}\) is a step function but \(F\) is continuous, and therefore \(A^2>0\). In the discrete case however\(A^2=0\) is possible. This shows that the two cases are fundamentally different and therefore require different formulas for the test statistic.
As for continuous data null distributions are found using simulation. In fact in the case of discrete data none of the tests has a known distribution for the test statistic under the null hypothesis.
These methods can be used for both discrete and histogram data. The main difference between these two is that discrete data has (a countable) number of possible values whereas histogram data has possible ranges of values (the bins). The only method directly affected by this difference is Wassp1, which requires actual values. All other methods ignore the vals argument.
We generate a data set of size 1000 from a Binomial distribution with n=20 and success probability 0.5, and then test \(H_0:F=Bin(20, 0.5)\).
vals=0:20 #possible values of random variable
pnull=function()  pbinom(0:20, 20, 0.5)  # cumulative distribution function (cdf)
rnull = function() table(c(0:20, rbinom(1000, 20, 0.5)))-1 
# generate data under the null hypothesis, make sure that vector of counts has 
#same length as vals, possibly 0.x = rnull()
# Basic Test
gof_test(x, vals, pnull, rnull, B=1000)
#> maxProcessor set to 1 for faster computation
#> $statistics
#>     KS      K     AD    CvM      W    l-P    s-P    l-L    s-L 
#> 0.1700 0.1730 0.1320 0.0249 2.0450 2.6010 1.9530 2.6750 2.0380 
#> 
#> $p.values
#>     KS      K     AD    CvM      W    l-P    s-P    l-L    s-L 
#> 0.6640 0.7570 0.9960 0.9680 0.6670 0.9978 0.9967 0.9974 0.9960
#Test with adjusted overall p value
gof_test_adjusted_pvalue(x, vals, pnull, rnull, 
                         B=c(1000, 500), maxProcessor = 1)
#> p values of individual tests:
#> W :  0.407
#> AD :  0.416
#> s-P :  0.9971
#> adjusted p value of combined tests: 0.7674x = table(c(0:20, rbinom(1000, 20, 0.55)))-1
#true p is 0.55, not 0.5
# Basic Test
gof_test(x, vals, pnull, rnull, B=1000, doMethod = "all")$p.value
#> maxProcessor set to 1 for faster computation
#>  KS   K  AD CvM   W l-P s-P l-L s-L 
#>   0   0   0   0   0   0   0   0   0
#Test with adjusted overall p value
gof_test_adjusted_pvalue(x, vals, pnull, rnull, 
                    B=c(1000, 500), maxProcessor = 1)
#> p values of individual tests:
#> W :  0.32
#> AD :  0.973
#> s-P :  0
#> adjusted p value of combined tests: 0.002Arguments of gof_test for discrete data/model:
x: vector with counts (histogram heights). Should have a number for each value of vals, possibly 0.
vals: all possible values of discrete random variable, that is all x with \(P(X=x)>0\)
pnull: function to find values of cumulative distribution function for each value of vals. Function has no arguments.
rnull: function to generate data from true density. Function has no arguments. Function needs to insure that output is a vector with same length as vals.
B=5000: number of simulation runs
w: function to find importance sampling weights, if needed
phat: function to estimate parameters
TS: function to find values of user-supplied test statistics
TSextra: a list that is passed to TS if any additional info is required.
nbins=c(50, 10): number of bins for chi square tests. The first one is already given by the data in the discrete case, for the second bins are joined.
rate=0, if not 0 sample size is assumed to have come from a Poisson random variable with rate “rate”.
minexpcount=5, minimal expected counts for chi square tests.
ChiUsePhat=TRUE, if TRUE uses user supplied function phat for parameter estimation. If false uses method of minimum chi square.
maxProcessor=1 if greater than 1 number of cores for parallel processing. Parallel processing is usually not be useful for discrete data as the single thread version generally runs faster.
doMethods=“all” names of methods to include
The arguments of gof_test_adjusted_pvalue for discrete data/model are the same, except that the number of simulation runs B is two numbers. The first is used for estimating the individual p values, the second for the adjustment.
In some fields like high energy physics it is common that the sample size is not fixed but a random variable drawn from a Poisson distribution with a known rate. Our package runs this as follows:
We generate a data set of size 1000 from a binomial distribution with n=20 and success probability p, and then test F=Bin(20, .). p is estimated from data.
vals=0:20
pnull=function(p=0.5)  pbinom(0:20, 20, ifelse(p>0&&p<1, p, 0.5))  
rnull = function(p=0.5) table(c(0:20, rbinom(1000, 20, p)))-1
phat = function(x) sum(0:20*x)/sum(x)/20x = table(c(0:20, rbinom(1000, 20, 0.5)))-1  
gof_test(x, vals, pnull, rnull, phat=phat, B=1000)$p.value
#> maxProcessor set to 1 for faster computation
#>     KS      K     AD    CvM      W    l-P    s-P    l-L    s-L 
#> 0.1700 0.2680 0.4900 0.6400 0.2470 0.3843 0.8239 0.3190 0.8074x = table(c(0:20, rbinom(1000, 20, 0.55)))-1 
# p is not 0.5, but data is still from a binomial distribution with n=20
gof_test(x, vals, pnull, rnull, phat=phat, B=1000)$p.value
#> maxProcessor set to 1 for faster computation
#>     KS      K     AD    CvM      W    l-P    s-P    l-L    s-L 
#> 0.0770 0.1280 0.2960 0.4820 0.0810 0.4317 0.3359 0.4815 0.3812x = table(c(rep(0:20, 5), rbinom(1000-21*5, 20, 0.53))) 
# data has to many small and large values to be from a binomial
gof_test(x, vals, pnull, rnull, phat=phat, B=1000)$p.value
#> maxProcessor set to 1 for faster computation
#>    KS     K    AD   CvM     W   l-P   s-P   l-L   s-L 
#> 0.214 0.004 0.000 0.000 0.011 0.000 0.000 0.000 0.000The arguments are the same as for the simple hypothesis case, except that the user has to supply a routine phat to estimate the parameters, and the functions pnull and rnull now have one argument, namely the vector with parameter estimates.
The estimation of the parameter(s) in the case of the chi square tests is done either by using the function phat or via the minimum chi square method. The routine uses a general function minimizer. If there are values of the parameter that are not possible this can lead to warnings. It is best to put a check into the pnull function to avoid this issue. As an example the function pnull above checks that the success probability p is in the interval \((0,1)\).
A variant of discrete data sometimes encountered is data given in the form of a histogram, that is as a set of bins and their counts. The main distinction is that discrete data has specific values, for example the non-negative integers for a Poisson distribution, whereas histogram data has ranges of numbers, the bins. It turns out that, though, that the only method that requires actual values is Wassp1, and for that method one can use the midpoint of the intervals.
As an example consider the following case: we have histogram data and we want to test whether it comes from an exponential rate 1 distribution, truncated to the interval 0-2:
rnull = function() {
  y = rexp(2500, 1) # Exp(1) data
  y = y[y<2][1:1500] # 1500 events on 0-2
  bins = 0:40/20 # binning
  hist(y, bins, plot=FALSE)$counts # find bin counts
}
x = rnull()
bins = 0:40/20
vals = (bins[-1]+bins[-21])/2 #use bin midpoints as values
pnull = function() {
   bins = 1:40/20
   pexp(bins, 1)/pexp(2, 1)
}
  
gof_test(x, vals, pnull, rnull)$p.value
#> maxProcessor set to 1 for faster computation
#>     KS      K     AD    CvM      W    l-P    s-P    l-L    s-L 
#> 0.1250 0.1644 0.5416 0.4640 0.6668 0.7620 0.7702 0.7351 0.7789pnull = function(x) pnorm(x)
rnull = function()  rnorm(1000)
TSextra = list(qnull=function(x) qnorm(x)) #optional quantile function used by chi square tests and Wassp1 test.x = rnorm(1000)
#Basic Tests
gof_test(x, NA, pnull, rnull, B=1000, TSextra=TSextra)$p.value
#> maxProcessor set to 1 for faster computation
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.2900 0.2900 0.2970 0.2700 0.0440 0.8550 0.4600 0.8970 0.3330 0.2113 0.1315 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.1050 0.2925 0.1702 0.1236 0.1327 0.2954
#Adjusted p value
gof_test_adjusted_pvalue(x, NA, pnull, rnull, B=c(1000,500), TSextra=TSextra, maxProcessor = 1)
#> p values of individual tests:
#> W :  0.084
#> ZC :  0.23
#> AD :  0.053
#> ES-s-P :  0.1315
#> adjusted p value of combined tests: 0.1428pnull = function(x, p=0) pnorm(x, p)
TSextra = list(qnull = function(x, p=0) qnorm(x, p))
rnull = function(p)  rnorm(1000, p)
phat = function(x) mean(x)x = rnorm(1000) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#> maxProcessor set to 1 for faster computation
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.5790 0.5790 0.8290 0.6750 0.6590 0.8400 0.8920 0.8780 0.8800 0.1819 0.5819 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.1445 0.8424 0.0904 0.5836 0.1194 0.8291x = rnorm(1000, 0.5) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra)$p.value
#> maxProcessor set to 1 for faster computation
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.9752 0.9752 0.8086 0.9284 0.9148 0.2738 0.2352 0.2422 0.7756 0.1701 0.1045 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.1571 0.8034 0.1721 0.0618 0.1669 0.7995x = rnorm(1000, 0.5, 2) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#> maxProcessor set to 1 for faster computation
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#>      0      0      0      0      0      0      0      0      0      0      0 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#>      0      0      0      0      0      0The arguments of gof_test are the same as in the discrete case, except that vals=NA. The functions pnull and rnull are functions of one variable in the case of a simple hypothesis and of two variables in the case of parameeter estimation.
pnull = function(x, p=c(0, 1)) pnorm(x, p[1], ifelse(p[2]>0, p[2], 0.001))
TSextra = list(qnull = function(x, p=c(0, 1)) qnorm(x, p[1], ifelse(p[2]>0, p[2], 0.001)))
rnull = function(p=c(0, 1))  rnorm(1000, p[1], ifelse(p[2]>0, p[2], 0.001))
phat = function(x) c(mean(x), sd(x))x = rnorm(1000) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#> maxProcessor set to 1 for faster computation
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.8030 0.8030 0.7320 0.7580 0.7350 0.8690 0.7090 0.8540 0.7170 0.3538 0.0778 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.7731 0.4104 0.3177 0.0808 0.7928 0.4082x = rnorm(1000, 0.5) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#> maxProcessor set to 1 for faster computation
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.7780 0.7780 0.6400 0.5390 0.4930 0.6740 0.8650 0.6300 0.6200 0.6458 0.6988 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.2448 0.7135 0.6520 0.7076 0.2315 0.7126x = rnorm(1000, 0.5, 2) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#> maxProcessor set to 1 for faster computation
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.9770 0.9770 0.9600 0.9720 0.9670 0.7800 0.9110 0.6460 0.9630 0.9952 0.8894 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.9482 0.8352 0.9930 0.8893 0.9284 0.8368For estimating the power of the various tests one also has to provide the routine ralt, which generates data under the alternative hypothesis:
vals = 0:10
pnull = function() pbinom(0:10, 10, 0.5)
rnull =function () table(c(0:10, rbinom(100, 10, 0.5)))-1
ralt =function (p=0.5) table(c(0:10, rbinom(100, 10, p)))-1
P=gof_power(pnull, vals, rnull, ralt, 
  param_alt=seq(0.5, 0.6, 0.02), B=Bsim, 
  nbins=c(11, 5), maxProcessor = 1)
plot_power(P, "p", Smooth=FALSE)In all cases the arguments are the same as for gof_test. In addition we now have
ralt: a routine with one parameter that generates data under some alternative hypothesis.
param_alt: values to be passed to ralt. This allows the calculation of the power for many different values.
alpha=0.05 type I error probability for tests.
B=1000 the number of simulation runs.
vals = 0:10
pnull = function(p=0.5) pbinom(0:10, 10, ifelse(0<p&p<1,p,0.001))
rnull = function (p=0.5) table(c(0:10, rbinom(100, 10, ifelse(0<p&p<1,p,0.001))))-1
phat = function(x) sum(0:10*x)/1000ralt =function (p=0.5) table(c(0:10, rbinom(100, 10, p)))-1
gof_power(pnull, vals, rnull, ralt, c(0.5, 0.6), phat=phat,
        B=Bsim, nbins=c(11, 5), maxProcessor = 1)
#>        KS     K    AD   CvM     W Wassp1   l-P   s-P
#> 0.5 0.036 0.034 0.054 0.056 0.042  0.048 0.038 0.038
#> 0.6 0.070 0.050 0.068 0.060 0.060  0.046 0.032 0.032Note that power estimation in the case of a composite hypothesis (aka with parameters estimated) is much slower than the simple hypothesis case.
pnull = function(x) pnorm(x)
TSextra = list(qnull = function(x) qnorm(x))
rnull = function() rnorm(100)
ralt = function(mu=0) rnorm(100, mu)
gof_power(pnull, NA, rnull, ralt, c(0, 1), 
          TSextra=TSextra, B=Bsim, maxProcessor = 1)
#>      KS     K    AD   CvM     W    ZA    ZK    ZC Wassp1 ES-l-P ES-s-P EP-l-P
#> 0 0.026 0.026 0.048 0.046 0.034 0.026 0.046 0.052  0.052  0.058  0.054  0.056
#> 1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000  1.000  1.000  1.000  1.000
#>   EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L
#> 0  0.046  0.066  0.062   0.05  0.056
#> 1  1.000  1.000  1.000   1.00  1.000pnull = function(x, p=c(0,1)) pnorm(x, p[1], ifelse(p[2]>0, p[2], 0.01))
TSextra = list(qnull = function(x, p=c(0,1)) qnorm(x, p[1], ifelse(p[2]>0, p[2], 0.01)))
rnull = function(p=c(0,1)) rnorm(500, p[1], p[2])
ralt = function(mu=0) rnorm(100, mu)
phat = function(x) c(mean(x), sd(x))
gof_power(pnull, NA, rnull, ralt, c(0, 1), phat= phat, 
          TSextra=TSextra, B=Bsim, maxProcessor=1)
#>      KS     K    AD   CvM     W    ZA    ZK    ZC Wassp1 ES-l-P ES-s-P EP-l-P
#> 0 0.958 0.898 0.030 0.040 0.034 0.544 0.004 0.010  0.996  0.050  0.034  0.056
#> 1 0.956 0.878 0.028 0.034 0.032 0.520 0.004 0.008  0.998  0.052  0.036  0.046
#>   EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L
#> 0  0.070  0.058  0.044  0.060  0.082
#> 1  0.038  0.052  0.036  0.056  0.038ralt = function(df=1) {
# t distribution truncated at +- 5  
  x=rt(1000, df)
  x=x[abs(x)<5]
  x[1:100]
}  
gof_power(pnull, NA, rnull, ralt, c(2, 50), phat=phat, 
          Range=c(-5,5), TSextra=TSextra, B=Bsim, maxProcessor=1)
#>       KS     K    AD   CvM     W    ZA    ZK    ZC Wassp1 ES-l-P ES-s-P EP-l-P
#> 2  0.992 0.978 0.562 0.514 0.542 0.946 0.272 0.184  1.000   0.27  0.346  0.236
#> 50 0.948 0.880 0.030 0.024 0.028 0.636 0.010 0.032  0.942   0.04  0.070  0.046
#>    EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L
#> 2   0.268  0.260  0.364  0.242  0.284
#> 50  0.054  0.044  0.074  0.056  0.068It is very easy for a user to add other goodness-of-fit tests to the package.
Example
Say we wish to use tests that are variants of the Cramer-vonMises test, using the integrated absolute difference of the empirical and the theoretical distribution function:
\[\int_{-\infty}^{\infty} \vert F(x) - \hat{F}(x) \vert^p dF(x)\] For continuous data we have the routine
newTScont = function(x, pnull, param) {
   Fx=sort(pnull(x))
   n=length(x)
   out = c(sum(abs( (2*1:n-1)/2/n-Fx )),sum(sqrt(abs( (2*1:n-1)/2/n-Fx ))))
   names(out) = c("CvM alt 1", "CvM alt 2")
   out
}This routine has to have three or four arguments x, pnull, param and (optionally) TSextra. x is the data and pnull a function that finds the cdf at x. param has to be the estimated parameters in the case of a composite null hypothesis, and is ignored in the case of a simple null hypothesis. Tsextra has to be a list of additional items needed for the calculation of the test statistic, if any.
Note that the return object has to be a named vector.
Then we can run this test with
pnull = function(x) punif(x)
rnull = function() runif(500)
x = rnull()
Rgof::gof_test(x, NA, pnull, rnull, TS=newTScont)
#> maxProcessor set to 1 for faster computation
#> $statistics
#> CvM alt 1 CvM alt 2 
#>     5.469    47.960 
#> 
#> $p.values
#> CvM alt 1 CvM alt 2 
#>    0.6264    0.6374Say we want to find the power of this test when the true distribution is a linear:
ralt = function(slope=0) {
  if(slope==0) y=runif(500)
    else y=(slope-1+sqrt((1-slope)^2+4*slope* runif(500)))/2/slope
}gof_power(pnull, NA, rnull, ralt, TS=newTScont, param_alt=round(seq(0, 0.5, length=3), 3), 
          Range=c(0,1), B=Bsim, maxProcessor = 1)
#>      CvM alt 1 CvM alt 2
#> 0        0.054     0.058
#> 0.25     0.898     0.906
#> 0.5      1.000     1.000for discrete data we will write the routine using Rcpp:
(For reasons to avoid issues with CRAN submission this routine is already part of Rgof)
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector newTSdisc(IntegerVector x, 
                      Function pnull,  
                      NumericVector param,
                      NumericVector vals) {
    
  Rcpp::CharacterVector methods=CharacterVector::create("CvM alt");    
  int const nummethods=methods.size();
  int k=x.size(), n, i;
  NumericVector TS(nummethods), ecdf(k), Fx(k);
  double tmp;
  TS.names() =  methods;
  Fx=pnull(param);
  n=0;
  for(i=0;i<k;++i) n = n + x[i];
  ecdf(0) = double(x(0))/double(n);  
  for(i=1;i<k;++i) {
    ecdf(i) = ecdf(i-1) + x(i)/double(n);
  }
  tmp = std::abs(ecdf[0]-Fx(0))*Fx(0);
  for(i=1;i<k;++i) 
     tmp = tmp + std::abs(ecdf(i)-Fx(i))*(Fx(i)-Fx(i-1));
  TS(0) = tmp;
 
  return TS;
}The routine has to have four or five arguments x, pnull, param, vals and (optionally) TSextra. The output vector has to have names.
Note that one drawback of writing the routine in Rcpp is that it is then not possible to use multiple processors.
As an example we will test whether some data comes from a Binomial distribution with n=10 trials. The success parameter p will be estimated from the data:
vals=0:10
pnull = function(p) pbinom(0:10, 10, p) 
rnull = function(p) table(c(0:10,rbinom(10000, 10, p)))-1
phat=function(x) sum(0:10*x)/100000
x = rnull(0.5)
gof_test(x, vals, pnull, rnull, phat=phat, TS=Rgof::newTSdisc)
#> maxProcessor set to 1 for faster computation
#> $statistics
#> CvM alt     l-P     s-P     l-L     s-L 
#> 0.00129 5.84800 3.92900 5.79700 3.87700 
#> 
#> $p.values
#> CvM alt     l-P     s-P     l-L     s-L 
#>  0.8342  0.7550  0.8635  0.7601  0.8680For the power calculations we will consider data that is actually a mixture of two Binomials:
ralt = function(tau=0) {
   x=rbinom(5000, 10, 0.5-tau)
   y=rbinom(5000, 10, 0.5+tau) 
   table(c(0:10,x,y))-1
}
gof_power(pnull, vals, rnull, ralt,
    TS=Rgof::newTSdisc, phat=phat, 
    param_alt=round(seq(0, 0.05, length=3), 3),
    B=Bsim, maxProcessor = 1)
#>       CvM alt
#> 0       0.034
#> 0.025   0.194
#> 0.05    1.000If the new routine can also calculate p values use the argument With.p.value=TRUE to indicate that. In that case the return object should be a (vector of) p values.
As no single test can be relied upon to consistently have good power, it is reasonable to employ several of them. We could then reject the null hypothesis if any of the tests does so, that is, if the smallest p-value is less than the desired type I error probability \(\alpha\).
This procedure clearly suffers from the problem of simultaneous inference, and the true type I error probability will be much larger than \(\alpha\). It is however possible to adjust the p value so it does achieve the desired \(\alpha\). This can be done as follows:
We generate a number of data sets under the null hypothesis. Generally about 1000 will be sufficient. Then for each simulated data set we apply the tests we wish to include, and record the smallest p value. Here is an example. Say the null hypothesis specifies a uniform \([0.1]\) and a sample size of 250.
pnull=function(x) punif(x)
rnull=function() runif(250)
pvals=matrix(0,1000,16)
for(i in 1:1000) 
  pvals[i, ]=Rgof::gof_test(rnull(), NA, pnull,
                            rnull,B=1000)$p.valuesNote this is not run because of CRAN time constraints. The resulting matrix is in Rgof::pvaluecdf.
Next we find the smallest p value in each run for two selections of four methods. One is the selection found to be best above, namely the methods by Wilson, Anderson-Darling, Zhang’s ZC and a chi square test with a small number of bins and using Pearson’s formula. As a second selection we use the methods by Kolmogorov-Smirnov, Kuiper, Anderson-Darling and Cramer-vonMises. It can be checked that for this null hypothesis these methods are highly correlated.
colnames(pvals)=names(Rgof::gof_test(rnull(), NA, pnull, rnull,B=10)$p.values)
p1=apply(pvals[, c("W", "ZC", "AD", "ES-s-P" )], 1, min)
p2=apply(pvals[, c("KS", "K", "AD", "CvM")], 1, min)Next we find the empirical distribution function for the two sets of p values and draw their graphs. We also add the curve for the cases of four identical tests and the case of four independent tests, which of course is the Bonferroni correction. The data for the cdf is in the inst/extdata directory of the package
tmp=Rgof::pvaluecdf
Tests=factor(c(rep("Identical Tests", nrow(tmp)),
        rep("Correlated Selection", nrow(tmp)),
        rep("Best Selection", nrow(tmp)),
        rep("Independent Tests", nrow(tmp))),
        levels=c("Identical Tests",  "Correlated Selection", 
                 "Best Selection", "Independent Tests"),
        ordered = TRUE)
dta=data.frame(x=c(tmp[,1],tmp[,1],tmp[,1],tmp[,1]),
          y=c(tmp[,1],tmp[,3],tmp[,2],1-(1-tmp[,1])^4),
          Tests=Tests)
ggplot2::ggplot(data=dta, ggplot2::aes(x=x,y=y,col=Tests))+
  ggplot2::geom_line(linewidth=1.2)+
  ggplot2::labs(x="p value", y="CDF")+
  ggplot2::scale_color_manual(values=c("blue","red", "Orange", "green"))Here is how to find these adjusted p values with Rgof:
Sometimes the data/model uses importance sampling weights. This can be done as follows. Say we want to test whether the data comes from a standard normal distribution, truncated to [-3,3] and with weights from a t distribution with 3 degrees of freedom:
\(H_0: F=N(0,1)\), \(X\sim t(3)\)
df=3
pnull=function(x) pnorm(x)/(2*pnorm(3)-1)
rnull=function() {x=rt(2000, df);x=x[abs(x)<3];sort(x[1:1000])}
w=function(x) (dnorm(x)/(2*pnorm(3)-1))/(dt(x,df)/(2*pt(3,df)-1))
x=sort(rnull())
plot(x, w(x), type="l", ylim=c(0, 2*max(w(x))))ralt=function(m=0) {x=rt(2000,df)+m;x=x[abs(x)<3];sort(x[1:1000])}
set.seed(111)
Rgof::gof_power(pnull, NA, rnull, ralt, w=w, param_alt = c(0,0.2), Range=c(-3,3),B=Bsim, maxProcessor = 1)
#>       KS    K   CvM   AD
#> 0   0.06 0.06 0.056 0.05
#> 0.2 1.00 1.00 1.000 1.00It should be noted that these tests are quite sensitive to the size of the weights and to the sample size, so one should always do a simulation study to verify that they work in the case under consideration.
The package includes the routine run.studies, which provides 20 case studies each for the continuous and the discrete case. This allows the user to easily compare the power of the methods included in the package to a different one of their choice.
If the user supplied method only yields the test statistic, simulation is used to find the p value. If the new test can find p values, these are used and then the routine will of course run much faster.
Say a user wishes to study the performance of the chi square test for the case where the null hypothesis specifies a uniform distribution but the data actually comes from a linear distribution with slope s. (This test is of course already included in the package, so this is for illustration purposes only). So
chitest=function(x, pnull, param, TSextra) {
    nbins=TSextra$nbins #number of bins
    bins=seq(min(x)-1e-10, max(x)+1e-10, length=nbins+1)
    O=hist(x, bins, plot=FALSE)$counts #bin counts
    if(param[1]!=-99) { #with parameter estimation
        E=length(x)*diff(pnull(bins, param)) #expected counts
        chi=sum((O-E)^2/E) #Pearson's chi square
        pval=1-pchisq(chi, nbins-1-length(param)) #p value
    }
    else {
      E=length(x)*diff(pnull(bins))
      chi=sum((O-E)^2/E)
      pval=1-pchisq(chi,nbins-1)
    }  
    out=ifelse(TSextra$statistic, chi, pval)
    names(out)="ChiSquare"
    out
}In this example we find the power of chitest when the null hypothesis specifies a uniform distribution but the true distribution is a linear with slope s:
TSextra=list(nbins=5, statistic=FALSE)
pwr=Rgof::run.studies(chitest, "uniform.linear", 
                      TSextra=TSextra, With.p.value=TRUE)
#> Running case uniform.linear.cont ...
Rgof::plot_power(pwr, "Slope")Arguments of run.studies:
The arguments of the user supplied test routine have to be
Continuous Data:
Discrete Data
In either case the output of the routine has to be a named vector with either the test statistic(s) or the p values(s).
The case studies are as follows. In each the first term specifies the model under the null hypothesis and the second the model under the alternative hypothesis. The cases studies with names ending in .est include parameter estimation.
Without parameter estimation
uniform.linear\(\hspace{3cm}\) U[0,1] vs a linear model on [0,1] with slope s.
uniform.quadratic\(\hspace{2.5cm}\) U[0,1] vs a quadratic model with vertex at 0.5 and some curvature a.
uniform.bump\(\hspace{3.1cm}\) U[0,1] vs U[0,1]+N(0.5,0.05).
uniform.sine\(\hspace{3.3cm}\) U[0,1] vs U[0,1]+Sine wave
beta22.betaaa\(\hspace{3cm}\) Beta(2,2) vs Beta(a,a)
beta22.beta2a\(\hspace{3cm}\) Beta(2,2) vs Beta(2,a)
normal.shift\(\hspace{3.5cm}\) N(0,1) vs N(\(\mu\),1)
normal.stretch\(\hspace{3.1cm}\) N(0,1) vs N(0, \(\sigma\))
normal.t\(\hspace{4.2cm}\) N(0,1) vs t(df)
normal.outlier1\(\hspace{3.1cm}\) N(0,1) vs N(0,1)+U[2,3]
normal.outlier2\(\hspace{3.1cm}\) N(0,1) vs N(0,1)+U[-3,-2]+U[2,3]
exponential.gamma\(\hspace{2.3cm}\) Exp(1) vs Gamma(1,b)
exponential.weibull\(\hspace{2.5cm}\) Exp(1) vs Weibull(1,b)
exponential.bump\(\hspace{2.7cm}\) Exp(1) vs Exp(1)+N(0.5,0.05)
trunc.exponential.linear\(\hspace{1.7cm}\) Exp(1) vs Linear, on [0,1]
With parameter estimation
normal.t.est\(\hspace{3.7cm}\) N(\(\mu\),\(\sigma\)) vs t(df)
exponential.weibull.est\(\hspace{1.9cm}\) Exp(\(\lambda\)) vs Weibull(1,b)
trunc.exponential.linear.est\(\hspace{1.2cm}\) Exp(\(\lambda\)) vs Linear, on [0,1]
exponential.gamma.est\(\hspace{1.8cm}\) Exp(\(\lambda\)) vs Gamma(1,b)
normal.cauchy.est\(\hspace{2.7cm}\) N(\(\mu\),\(\sigma\)) vs Cauchy (Breit-Wigner)
Generally a user will likely wish to run all the included case studies. This is very easily done with
If only the test statistic is supplied by the new test and therefore simulation has to be used to find p values, this will take several hours to run. run.studies can also be used to run the studies for the included methods but for different values of param_alt, nsample and/or alpha. For example, say we wish to find the powers of the included continuous methods for the case uniform.linear and samples of size 2000, slopes of 0.1 and 0.2, and a true type I error of 0.01: