Abstract
The endo–exo problem lies at the heart of statistical identification in many fields of science, and is often plagued by spurious strong-and-long memory due to improper treatment of trends, shocks and shifts in the data. A class of models that has shown to be useful in discerning exogenous and endogenous activity is the Hawkes process. This class of point processes has enjoyed great recent popularity and rapid development within the quantitative finance literature, with particular focus on the study of market microstructure and high frequency price fluctuations. We show that there are important lessons from older fields like time series and econometrics that should also be applied in financial point process modelling. In particular, we emphasize the importance of appropriately treating trends and shocks for the identification of the strength and length of memory in the system. We exploit the powerful Expectation Maximization algorithm and objective statistical criteria (BIC) to select the flexibility of the deterministic background intensity. With these methods, we strongly reject the hypothesis that the considered financial markets are critical at univariate and bivariate microstructural levels.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Spencer Wheatley http://orcid.org/0000-0002-5668-0215
Notes
1 Deterministic trends vs. stochastic fluctuations, inhomogeneity vs. clustering, heterogeneity (diversity) vs. coupling/contagion, independent/spontaneous activity vs. dependent/triggered activity, etc.
2 Some more examples: Economic time series: Directional response to intervention, or simply random fluctuation? Physics: Particles interacting, or in a heterogeneous field/medium? Success: Is it a grass-roots viral phenomenon, or ‘astroturf’ synthetic viral content? (Sornette and Helmstetter Citation2003, Sornette Citation2005, Crane and Sornette Citation2008).
3 As stated in the context of spatial statistics (Diggle Citation2006) ‘[There exists a] fundamental ambiguity between heterogeneity and clustering, the first corresponding to spatial variation of the intensity function, the second to stochastic dependence amongst the points of the process··· [and these are]··· difficult to disentangle’.
4 estimating the standard Hawkes process, having constant immigration and an exponential memory kernel, on small windows of mid-price changes of the S&P500 E-mini futures contracts.
5 We go beyond their work by identifying the universal nature of the problem, analogue to time series and econometrics, providing a simpler solution for estimation based on the EM algorithm, performing a powerful test of criticality, and simulation studies for the consistency of the estimation and BIC-based selection relevant for the test of criticality.
6 A linear stochastic process has a unit root if 1 is a root of the process's characteristic equation. Such a process is non-stationary (‘integrated’), and may or may not have a trend.
7 Granger and Newbold in 1973 lamented that ‘spurious relationships appear to us to be inherent in a good deal of current econometric methodology’ (Citation1974). In 1988, Tsay stated ‘outliers and structure changes are commonly encountered in time series data analysis. The presence of those extraordinary events could easily mislead the conventional time series analysis’ (Citation1988).
8 In view of the ARMA point process, which features bursts/shocks as well as Hawkes-type self-excitation (Dassios and Zhao Citation2011, Wheatley et al. Citation2018).
9 Defined as
10 This is a well-known objective statistical criterion for model selection, which enables fair model comparison. More specifically, the log-likelihood increases monotonically with the number of parameters or effective degrees of freedom (Efron Citation2004) because it can increasingly over-fit the data. Although simple, the AIC and BIC have rigorous justification (Hastie et al. Citation2001). Using the BIC is equivalent to a likelihood ratio test, where the level of the test becomes more strict as the sample size increases. This is a reasonable feature as tests also becomes more powerful with increasing sample size.
11 For a Poisson process, the mean and variance of the number of points in a window are equal. However, a single Hawkes cluster (being a Galton–Watson branching process) has a Borel distribution for the number of points, having mean equal to and variance
. Hence for
the variance is about 10 thousand times greater than its mean, underlining the importance of using higher order moments or correctly specified MLE in estimation.
12 For very fast simulation without inefficiency (i.e. without thinning), we recursively simulate the process generation-by-generation, which is possible due to the branching process representation of the Hawkes process (Ogata Citation1981). The number of points drawn from each (inhomogeneous Poisson) process and
are drawn from the Poisson distribution (with mean
and η, respectively). For time-location, inverse-transform sampling (Cinlar Citation1975) can then be applied (sampling from densities
and
).
13 Results are neither highly sensitive to this value nor varying significantly across different realizations of the randomization.
14 An immigration intensity reported with zero degrees of freedom corresponds to the case of constant immigration. We have also fitted models with a piecewise constant estimator of the exogenous intensity (as e.g. proposed in Bacry and Muzy (Citation2014) or Lallouache and Challet (Citation2016)), where the allowed degrees of freedom for the immigration determines the number of equally distributed knots of the respective piecewise linear function. We find that, while overall the piecewise constant estimator and the flexible logspline agree on the intraday profile of the immigration intensity, using an adaptive nonparametric estimator consistently yields lower AIC/BIC, lower branching ratios and higher average immigration. Furthermore, we observe that models with a piecewise constant exogenous intensity estimator tend to satisfy convergence criteria in the EM algorithm with fewer iterations, suggesting potential problems with selecting local optima.
15 For reference, we have also tested a model using exponential kernels as in (Equation5(5)
(5) ) with P=1 and find that rejection for this specification is even more clear than for models with approximate power law or nonparametric kernels. This is additionally in agreement with other literature, where short-memory kernels are found to provide an inferior description of financial data.
16 The bootstrap consists in simulating from the estimated BIC-optimal models on a window of one day, estimating again models of varying p and selecting the one with the lowest BIC. The kernel parameters in the EM are initialized with the empirical BIC-optimal values.
17 A typical termination criterion is to define a threshold for the norm of the changes in the K-dimensional parameter vector
of the model from one EM iteration ℓ to the next. In our applications, we use the supremum norm
.