3,485
Views
5
CrossRef citations to date
0
Altmetric
Research Papers

State-dependent Hawkes processes and their application to limit order book modelling

& ORCID Icon
Pages 563-583 | Received 31 Aug 2019, Accepted 13 Sep 2021, Published online: 07 Dec 2021

Figures & data

Figure 1. Simulation of a state-dependent Hawkes process with de=1, dx=2. The upper plot shows the evolution of the state process. The blue dots indicate the event times and the lower plot represents the intensity. The process is specified so that ν1=1 and k11(t,x)=exp(4t)1{x=2}, that is, in state 2 the process exhibits exponential self-excitation whereas no self-excitation occurs in state 1.

Figure 1. Simulation of a state-dependent Hawkes process with de=1, dx=2. The upper plot shows the evolution of the state process. The blue dots indicate the event times and the lower plot represents the intensity. The process is specified so that ν1=1 and k11(t,x)=exp⁡(−4t)1{x=2}, that is, in state 2 the process exhibits exponential self-excitation whereas no self-excitation occurs in state 1.

Figure 2. Descriptive statistics of level-I order flow of INTC. Except for Figure (a), only the data between 12:00 and 14:30 are used. In Figure (a), the sample mean of the arrival rate of level-I orders is computed over 10-minute bins. The translucent area represents the range of the arrival rate across the 250 trading days, excluding the bottom and top 5% values. (a) Arrival rate of level-I orders. (b) Number of level-I orders. (c) Fraction of level-I orders with non-unique timestamp. (d) Distribution of level-I order types.

Figure 2. Descriptive statistics of level-I order flow of INTC. Except for Figure 2(a), only the data between 12:00 and 14:30 are used. In Figure 2(a), the sample mean of the arrival rate of level-I orders is computed over 10-minute bins. The translucent area represents the range of the arrival rate across the 250 trading days, excluding the bottom and top 5% values. (a) Arrival rate of level-I orders. (b) Number of level-I orders. (c) Fraction of level-I orders with non-unique timestamp. (d) Distribution of level-I order types.

Figure 3. Joint distribution of events and states for INTC, depicting the empirical distribution of the marks (En,Xn) for the two considered state processes. (a) ModelS (state process: bid–ask spread). (b) ModelQI (state process: queue imbalance).

Figure 3. Joint distribution of events and states for INTC, depicting the empirical distribution of the marks (En,Xn) for the two considered state processes. (a) ModelS (state process: bid–ask spread). (b) ModelQI (state process: queue imbalance).

Table 1. Summary of ModelS and ModelQI.

Figure 4. Estimated transition distributions ϕˆ of ModelS and ModelQI. We report the average of ϕˆ(i) across the 250 trading days. (Daily estimates vary little from these averaged values.). (a) Transition probabilities of the bid–ask spread (ModelS). (b) Transition probabilities of the queue imbalance (ModelQI).

Figure 4. Estimated transition distributions ϕˆ of ModelS and ModelQI. We report the average of ϕˆ(i) across the 250 trading days. (Daily estimates vary little from these averaged values.). (a) Transition probabilities of the bid–ask spread (ModelS). (b) Transition probabilities of the queue imbalance (ModelQI).

Figure 5. The estimated kernel kˆ under ModelS and ModelQI. Each panel describes self- or cross-excitation as indicated by its title, whilst each colour corresponds to a different state. For example, in Figure (a), the red curves in the second panel represent the estimates kˆee(i)(,x) where e={ ask}, e={ bid} and x={ 1}. All daily estimates are superposed with one translucent curve for each day. An ‘aggregate’ kernel is represented by a solid line, computed using the median of αˆ(i) and βˆ(i) across the 250 trading days. (a) ModelS (state variable: bid–ask spread). (b) ModelQI (state variable: queue imbalance).

Figure 5. The estimated kernel kˆ under ModelS and ModelQI. Each panel describes self- or cross-excitation as indicated by its title, whilst each colour corresponds to a different state. For example, in Figure 5(a), the red curves in the second panel represent the estimates kˆe′e(i)(⋅,x) where e′={ ask}, e={ bid} and x={ 1}. All daily estimates are superposed with one translucent curve for each day. An ‘aggregate’ kernel is represented by a solid line, computed using the median of αˆ(i) and βˆ(i) across the 250 trading days. (a) ModelS (state variable: bid–ask spread). (b) ModelQI (state variable: queue imbalance).

Figure 6. Estimated base rate vector νˆ(i) for ModelS and ModelQI over time (in number of events per second). (a) ModelS. (b) ModelQI.

Figure 6. Estimated base rate vector νˆ(i) for ModelS and ModelQI over time (in number of events per second). (a) ModelS. (b) ModelQI.

Figure 7. Estimated kernel norms kˆ1, for ModelQI over time. The dashed line marks 5 February 2018 (‘the return of volatility’), a day when the CBOE Volatility Index (VIX) jumped by 116% to 38 points, a level not seen since August 2015. This day seems to have introduced a systematic change in the magnitude of cross- and self-excitation. The spike in cross-excitation that occurs on 21 February 2018 is linked to a sudden change in market behaviour around 14:00 on that day, when Intel Corporation rolled out patches for its most recent generation of processors.

Figure 7. Estimated kernel norms ‖kˆ‖1,∞ for ModelQI over time. The dashed line marks 5 February 2018 (‘the return of volatility’), a day when the CBOE Volatility Index (VIX) jumped by 116% to 38 points, a level not seen since August 2015. This day seems to have introduced a systematic change in the magnitude of cross- and self-excitation. The spike in cross-excitation that occurs on 21 February 2018 is linked to a sudden change in market behaviour around 14:00 on that day, when Intel Corporation rolled out patches for its most recent generation of processors.

Figure 8. The upper panel depicts the evolution of the queue imbalance and level-I order flow of INTC on 13 February 2018 and the ask (red dots) and bid (blue dots) events. The lower panel displays the estimated intensities of ModelQI. (The self- and cross-excitation kernel norms of ModelQI on 13 February 2018 are visualised in Figure .)

Figure 8. The upper panel depicts the evolution of the queue imbalance and level-I order flow of INTC on 13 February 2018 and the ask (red dots) and bid (blue dots) events. The lower panel displays the estimated intensities of ModelQI. (The self- and cross-excitation kernel norms of ModelQI on 13 February 2018 are visualised in Figure A2.)

Figure 9. In-sample (12:00–14:30) and out-of-sample (14:30–15:00) Q–Q plots of event residuals under ModelS, ModelQI (state-dependent) and an ordinary Hawkes process (simple). The residuals of the ith day are computed using the ML estimates (νˆ(i),αˆ(i),βˆ(i)) obtained from the 12:00–14:30 period. The empirical quantiles are obtained by pooling the residuals of all 250 trading days. The two panels in each sub-figure correspond to the sequences of residuals (rne) for eE={ask,bid}. (a) ModelS: in sample. (b) ModelS: out of sample. (c) ModelQI: in sample. (d) ModelQI: out of sample.

Figure 9. In-sample (12:00–14:30) and out-of-sample (14:30–15:00) Q–Q plots of event residuals under ModelS, ModelQI (state-dependent) and an ordinary Hawkes process (simple). The residuals of the ith day are computed using the ML estimates (νˆ(i),αˆ(i),βˆ(i)) obtained from the 12:00–14:30 period. The empirical quantiles are obtained by pooling the residuals of all 250 trading days. The two panels in each sub-figure correspond to the sequences of residuals (rne) for e∈E={ask,bid}. (a) ModelS: in sample. (b) ModelS: out of sample. (c) ModelQI: in sample. (d) ModelQI: out of sample.

Figure 10. The estimated spectral radius ρˆ(x) as a function of xX under ModelS and ModelQI. The daily profiles xρˆ(x)(i), i=1,,250, are represented by the red translucent curves. (a) ModelS. (b) ModelQI.

Figure 10. The estimated spectral radius ρˆ(x) as a function of x∈X under ModelS and ModelQI. The daily profiles x↦ρˆ(x)(i), i=1,…,250, are represented by the red translucent curves. (a) ModelS. (b) ModelQI.

Figure 11. In-sample (12:00–14:30) Q–Q plots of total residuals of ModelQI (state-dependent) and the alternative model given by (Equation8) (complex) on 11 May 2018. Each panel corresponds to a sequence of residuals (r~nex) for all eE and xX.

Figure 11. In-sample (12:00–14:30) Q–Q plots of total residuals of ModelQI (state-dependent) and the alternative model given by (Equation8(8) λ~ex(t)=νex+∑e′∈E,x′∈X∫[0,t)ke′x′ex(t−s)dN~e′x′(s),t≥0,e∈E,x∈X,(8) ) (complex) on 11 May 2018. Each panel corresponds to a sequence of residuals (r~nex) for all e∈E and x∈X.

Table A1. Parameter values for Specification 1.

Table A2. Parameter values for Specification 2.

Figure A1. Violin plots of the worst estimation errors (EquationA2) under two different sets of parameter values (specifications 1 and 2). For every specification and sample size N, we simulate 100 paths with sample size N and perform ML estimation for each of them. The true parameters are used as the initial guess in the optimisation procedure to speed up estimation and reduce the computational cost. (a) Specification 1. (b) Specification 2.

Figure A1. Violin plots of the worst estimation errors (EquationA2(A2) ϵrel:=θˆij⋆−θij⋆θij⋆,wherej⋆=argmaxj|θˆij−θij|θij.(A2) ) under two different sets of parameter values (specifications 1 and 2). For every specification and sample size N, we simulate 100 paths with sample size N and perform ML estimation for each of them. The true parameters are used as the initial guess in the optimisation procedure to speed up estimation and reduce the computational cost. (a) Specification 1. (b) Specification 2.

Figure A2. Uncertainty quantification of the estimated excitation profiles for INTC on 13 February 2018 under ModelQI. The parametric bootstrap procedure involves simulating 100 paths covering 2.5 hours of trading using the ML estimates of the parameters on the considered day and applying ML estimation again to each of the simulated paths. Three random sets of parameters are used as the initial guess in the optimisation procedure. We use the 100 estimates to compute a 99%-confidence interval for the truncated kernel norm (translucent area). The solid line corresponds to the ML estimates using the original INTC data.

Figure A2. Uncertainty quantification of the estimated excitation profiles for INTC on 13 February 2018 under ModelQI. The parametric bootstrap procedure involves simulating 100 paths covering 2.5 hours of trading using the ML estimates of the parameters on the considered day and applying ML estimation again to each of the simulated paths. Three random sets of parameters are used as the initial guess in the optimisation procedure. We use the 100 estimates to compute a 99%-confidence interval for the truncated kernel norm (translucent area). The solid line corresponds to the ML estimates using the original INTC data.