
PINstimation provides utilities for the estimation of probability of informed trading models: original PIN (PIN) in Easley and O’Hara (1992) and Easley et al. (1996); multilayer PIN (MPIN) in Ersan (2016); Adjusted PIN (AdjPIN) in Duarte and Young (2009); and volume- synchronized PIN (VPIN) in Easley et al. (2011, 2012). Various computation methods suggested in the literature are included. Data simulation tools and trade classification algorithms are among the supplementary utilities. The package enables fast and precise solutions for the sophisticated, error-prone and time-consuming estimation procedure of the probability of informed trading measures, and it is compact in the sense detailed estimation results can be achieved by solely the use of raw trade level data.
We introduce a new function called classify_trades()
that enables users to classify high-frequency (HF) trades individually,
without aggregating them.
For each HF trade, the function assigns a variable isBuy
that is set to TRUE if the trade is buyer-initiated, or
FALSE if it is seller-initiated.
The aggregate_trades() function enables users to
aggregate high-frequency (HF) trades at different frequencies. In the
previous version, HF trades were automatically aggregated into daily
trade data. However, with the updated version, users can now specify the
desired frequency, such as every 15 minutes.
The functionalities that the package offers are summarized below:
pin(),
pin_yz(), pin_gwj(), and
pin_ea().initials_pin_yz(), initials_pin_gwj(), and
initials_pin_ea().generatedata_mpin(layers=1).fact_pin_eho(), fact_pin_lk(),
fact_pin_e().pin_bayes() **(*)** .mpin_ml()
and mpin_ecm().initials_mpin().detectlayers_e(), detectlayers_eg(), and
detectlayers_ecm().generatedata_mpin().fact_mpin().adjpin().initials_adjpin(), initials_adjpin_cl(), and
initials_adjpin_rnd().generatedata_adjpin().fact_adjpin().vpin()tick,
quote, LR and EMO algorithms
using the function aggregate_trades()The easiest way to get PINstimation is the following:
install.packages("PINstimation")To get a bugfix or to use a feature from the development version, you can install the development version of PINstimation from GitHub.
# install.packages("devtools")
# library(devtools)
devtools::install_github("monty-se/PINstimation", build_vignettes = TRUE)Loading the package
library(PINstimation)We estimate the PIN model on preloaded dataset
dailytrades using the initial parameter sets of Ersan &
Alici (2016).
estimate <- pin_ea(dailytrades)## [+] PIN Estimation started 
##   |[1] Likelihood function factorization: Ersan (2016)
##   |[2] Loading initial parameter sets   : 5 EA initial set(s) loaded
##   |[3] Estimating PIN model (1996)      : Using Maximum Likelihood Estimation
##   |+++++++++++++++++++++++++++++++++++++| 100% of PIN estimation completed
## [+] PIN Estimation completedWe run the estimation of the MPIN model on preloaded dataset
dailytrades using:
ml_estimate <- mpin_ml(dailytrades)## [+] MPIN estimation started
##   |[1] Detecting layers from data       : using Ersan and Ghachem (2022a)
##   |[=] Number of layers in the data     : 3 information layer(s) detected
##   |[2] Computing initial parameter sets : using algorithm of Ersan (2016)
##   |[3] Estimating the MPIN model        : Maximum-likelihood standard estimation
##   |+++++++++++++++++++++++++++++++++++++| 100% of mpin estimation completed
## [+] MPIN estimation completedecm_estimate <- mpin_ecm(dailytrades)## [+] MPIN estimation started
##   |[1] Computing the range of layers    : information layers from 1 to 8
##   |[2] Computing initial parameter sets : using algorithm of Ersan (2016)
##   |[=] Selecting initial parameter sets : max 100 initial sets per estimation
##   |[3] Estimating the MPIN model        : Expectation-Conditional Maximization algorithm
##   |+++++++++++++++++++++++++++++++++++++| 100% of estimation completed [8 layer(s)]
##   |[3] Selecting the optimal model      : using lowest Information Criterion (BIC)
## [+] MPIN estimation completedCompare the aggregate parameters obtained from the ML, and ECM estimations.
mpin_comparison <- rbind(ml_estimate@aggregates, ecm_estimate@aggregates)
rownames(mpin_comparison) <- c("ML", "ECM")
cat("Probabilities of ML, and ECM estimations of the MPIN model\n")
print(mpin_comparison)Display the summary of the model estimates for all number of layers.
summary <- getSummary(ecm_estimate)
show(summary)##          layers em.layers  MPIN Likelihood    AIC    BIC    AWE
## Model[1]      1         1 0.566  -3226.469 6462.9 6473.4 6508.9
## Model[2]      2         2 0.577   -800.379 1616.8 1633.5 1690.3
## Model[3]      3         3 0.574   -643.458 1308.9 1332.0 1410.0
## Model[4]      4         3 0.574   -643.458 1308.9 1332.0 1410.0
## Model[5]      5         3 0.574   -643.458 1308.9 1332.0 1410.0
## Model[6]      6         3 0.574   -643.458 1308.9 1332.0 1410.0
## Model[7]      7         4 0.575   -642.631 1313.3 1342.6 1441.9
## Model[8]      8         4 0.575   -642.631 1313.3 1342.6 1441.9We estimate the adjusted PIN model on preloaded dataset
dailytrades using 20 initial parameter sets
computed by the algorithm of Ersan and Ghachem (2022b).
estimate_adjpin <- adjpin(dailytrades, initialsets = "GE")
show(estimate_adjpin)## [+] AdjPIN estimation started
##   |[1] Computing initial parameter sets : 20 GE initial sets generated
##   |[2] Estimating the AdjPIN model      : Maximum-likelihood Standard Estimation
##   |+++++++++++++++++++++++++++++++++++++| 100% of AdjPIN estimation completed
## [+] AdjPIN estimation completedWe run a VPIN estimation on preloaded dataset hfdata
with timebarsize of 5 minutes
(300 seconds).
estimate.vpin <- vpin(hfdata, timebarsize = 300)
show(estimate.vpin)## ----------------------------------
## VPIN estimation completed successfully.
## ----------------------------------
## Type object@vpin to access the VPIN vector.
## Type object@bucketdata to access data used to construct the VPIN vector.
## Type object@dailyvpin to access the daily VPIN vectors.
## 
## [+] VPIN descriptive statistics
## 
## |      | Min.  | 1st Qu. | Median | Mean  | 3rd Qu. | Max.  | NA's |
## |:-----|:-----:|:-------:|:------:|:-----:|:-------:|:-----:|:----:|
## |value | 0.101 |  0.185  | 0.238  | 0.244 |  0.29   | 0.636 |  49  |
## 
## 
## [+] VPIN parameters
## 
## | tbSize | buckets | samplength |   VBS    | #days |
## |:------:|:-------:|:----------:|:--------:|:-----:|
## |  300   |   50    |     50     | 36321.25 |  77   |
## 
## -------
## Running time: 3.753 secondsWe use the preloaded high-frequency dataset hfdata,
prepare it for aggregation.
data <- hfdata
data$volume <- NULLWe classify data using the LR algorithm with a time lag of
500 milliseconds (0.5 s), using the function
aggregate_data().
daytrades <- aggregate_trades(data, algorithm = "LR", timelag = 500)## [+] Trade classification started
##   |[=] Classification algorithm         : LR algorithm
##   |[=] Number of trades in dataset      : 100 000 trades
##   |[=] Time lag of lagged variables     : 500 milliseconds
##   |[1] Computing lagged variables       : using parallel processing
##   |+++++++++++++++++++++++++++++++++++++| 100% of variables computed
##   |[=] Computed lagged variables        : in 7.68 seconds
##   |[2] Computing aggregated trades      : using lagged variables
## [+] Trade classification completed                We use the obtained dataset to estimate the (adjusted) probability of informed trading via the standard Maximum-likelihood method.
adjpin_ml <- adjpin(daytrades, method = "ML", initialsets = "GE")## [+] AdjPIN estimation started
##   |[1] Computing initial parameter sets : 20 GE initial sets generated
##   |[2] Estimating the AdjPIN model      : Maximum-likelihood Standard Estimation
##   |+++++++++++++++++++++++++++++++++++++| 100% of AdjPIN estimation completed
## [+] AdjPIN estimation completedIf you are a frequent user of PINstimation, you might want to avoid
repetitively loading the package PINstimation whenever you open a new R
session. You can do that by adding PINstimation to
.R profile either manually, or using the function
load_pinstimation_for_good().
To automatically load PINstimation, run
load_pinstimation_for_good(), and the following code will
be added to your .R profile.
if (interactive()) suppressMessages(require(PINstimation))After restart of the R session, PINstimation will be loaded
automatically, whenever a new R session is started. To remove the
automatic loading of PINstimation, just open the .R profile for editing
usethis::edit_r_profile(), find the code above, and delete
it.
For a smooth introduction to, and useful tips on the main functionalities of the package, please refer to:
The package makes a series of original contributions to the literature:
An efficient, user-friendly, and comprehensive implementation of the standard models of probability of informed trading.
A first implementation of the estimation of the multilayer probability of informed trading (MPIN) as developed by Ersan (2016).
A comprehensive treatment of the estimation of the adjusted probability of informed trading as introduced by Duarte and Young (2009). This includes the implementation of the factorization of the AdjPIN likelihood function, various algorithms to generate initial parameter sets, and MLE method.
The introduction of the expectation-conditional maximization (ECM) algorithm as an alternative method to estimate the models of probability of informed trading. The contribution is both theoretical and computational. The theoretical contribution is included in the paper by Ghachem and Ersan (2022b). The implementation of the ECM algorithm allows the estimation of PIN, MPIN, as well as the adjusted PIN model.
Implementation of three layer-detection algorithms, namely of preexistent algorithm of Ersan (2016), as well as two newly developed algorithms, described in Ersan and Ghachem (2022a), and Ghachem and Ersan (2022b), respectively.
A first implementation of the estimation of the volume-synchronized probability of informed trading (VPIN) as introduced by Easley et al. (2011, 2012).
One do-it-all function for trade classification
in buyer-initiated or seller-initiated trades that implements the
standard algorithms in the field, namely Tick,
Quote, LR, and EMO.
To our knowledge, there are three preexisting R packages for the estimation of models of the probability of informed trading: pinbasic, InfoTrad, and FinAsym.
If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub.