52
Views
0
CrossRef citations to date
0
Altmetric
Research Papers

Trade co-occurrence, trade flow decomposition and conditional order imbalance in equity markets

, &
Pages 779-809 | Received 22 Sep 2022, Accepted 15 May 2024, Published online: 19 Jun 2024

Abstract

The time proximity of high-frequency trades can contain a salient signal. In this paper, we propose a method to classify every trade, based on its proximity with other trades in the market within a short period of time, into five types. By means of a suitably defined normalized order imbalance associated to each type of trade, which we denote as conditional order imbalance (COI), we investigate the price impact of the decomposed trade flows. Our empirical findings indicate strong positive correlations between contemporaneous returns and COIs. In terms of predictability, we document that associations with future returns are positive for COIs of trades which are isolated from trades of stocks other than themselves, and negative otherwise. Furthermore, trading strategies which we develop using COIs achieve conspicuous returns and Sharpe ratios, in an extensive experimental setup on a universe of 457 stocks using daily data for a period of 4 years.

JEL Classification:

1. Introduction

The transformation of major equity exchanges to electronic trading significantly reshapes the market microstructure landscape, by reducing latency up to nanoseconds (O'Hara Citation2015, Hirschey Citation2021), and thus leading to market participants achieving unprecedented levels of profitability in their trading strategies. Every agent in the market can directly submit and cancel limit orders. Trades are settled when existing limit orders are executed by market orders/marketable limit orders. Trades, carrying distinct information and having their own impact on the price changes of the underlying stocks, have been classified into different types and studied separately by academics and practitioners. For example, grouping by directions of trading, Chordia et al. (Citation2016) study flows of buyer- and seller-initiated trades, thus decomposing into aggressive buys and aggressive sells. Kraus and Stoll (Citation1972) and Lee et al. (Citation2004) separate institutional trades from trades placed by individual investors. Different from these classifications, which are exclusively based on the characteristics of the individual trades, in this paper, we classify trades according to their time of placement relative to the arrival time of other trades across the market, both within the same asset and also cross-sectionally across the available universe of stocks. We find that the time proximity of trade arrivals contains salient information on explaining contemporaneous price impact and forecasting subsequent future returns.

Our motivation arises from the fact that market participants can make trading decisions by observing the trade flows in the market. Previous works (Kyle Citation1985, Kyle et al. Citation2011) model the price formation at high frequency and suggest that informed traders split large orders into many smaller orders to conceal their true purpose, while other market participants monitor order flows in the market to reach trading decisions. The development of high-performance trading systems has led to an astounding growth of high-frequency trading (HFT) and diversity of strategies (Hagströmer and Nordén Citation2013). In this world, the reaction time plays an important role because opportunities can be transient if not acted upon within micro-seconds and even nano-seconds. High-frequency trading strategies include anticipating trade flow (Hirschey Citation2021) and preying on other market participants (Van Kervel and Menkveld Citation2019). The questions we are interested in exploring concern whether certain trades, interacting with other trades in various different ways, contain useful information, and how they contribute to stock price movements, helping us shed light on the price formation mechanism at both short-term and long-term horizons. To be specific, interaction refers to the fact that arrivals of trades may affect each other. Trades can occur in response to some events. Placements of trades can be initiated by the arrival of other trades or by changes in order imbalance, especially for HFT strategies based on observing order flows.

We start with proposing the concept of co-occurrence of trades, defined in Section 3.1, which offers a tool to identify and group trades based on their interactions with other trades. For each given trade, we consider it to co-occur and interact with another trade if both trades are taking place close in time to each other. To define and quantify ‘closeness’, we pre-define a neighbourhood size δ. If the time difference between two trades is lower than δ, they are close to each other and they co-occur. Note that the threshold δ is an important parameter, determining the set of trades that co-occur. However, there is no strict rule to set its value. Intuitively, considering a scenario where an HFT preys on an institutional trader and trades in response to institutional marketable orders, we aim to capture these interactions and classify such trades into a category of, for example, actively interactive trades. With this in mind, an appropriate choice should be greater than the round-trip latency plus the time for the HFT to detect and make trading decisions, which is usually undisclosed. Therefore, we experiment with multiple values of δ, and compare and contrast the corresponding results. Note that δ should not be too large either, since a large neighbourhood is likely to incorporate irrelevant trades from the market. To select the neighbourhood size, we first introduce a null model of completely random order arrivals. Then we select the δ that maximize the difference between the empirical co-occurrence of trades and the co-occurrence under the null model. We find that δ=1ms is an appropriate choice and use it for the empirical analysis in this study. In addition, we also make a comparison across different choices of δ values in Appendix 5.

Using trade co-occurrence, we decompose daily trade flows by classifying all the trades of all stocks into subgroups. Given a trade, we determine to which group it belongs by asking the following two questions: Does it interact with other trades? If yes, does it interact with only trades of the same stock as itself, only with stocks different from itself, or with both kinds? Depending on the answer, a trade will be placed into one or two classes, for which detailed rules are explained in Section 3.2. After labelling all trades, we study the relations between returns and subgroups of trades.

We use order imbalance as a bridge connecting trade flows and stock returns, which has been thoroughly studied in the finance literature. An inventory paradigm (Stoll Citation1978, Spiegel and Subrahmanyam Citation1995, Chordia et al. Citation2002) suggests that, in intermediated markets, a difference, or so-called imbalance, between buyer-initiated and seller-initiated trades puts pressure on a market maker's inventory. In response, the market makers adjust inventories to maintain their market exposures, which drives the price to one direction.

Next, at a daily level, we investigate the properties of aggregated order imbalance of each category of trades and their relation with individual stock returns during normal trading hours. Data exploration indicates that all categories of conditional, as well as the unconditional, order imbalance are positively auto-correlated. The conditional order imbalances (COIs) all have strong positive correlations with the original order imbalance. However, they are not necessarily highly correlated with each other.

Our empirical results concentrate on the imbalance–return relations. By means of regression analysis, we discover positive and significant correlations between order imbalances and price changes within the same day. Furthermore, in comparison to a standard regression analysis, decomposing order flows leads to significantly higher adjusted R2 in our multiple regression settings, which can be interpreted as better explanatory power in contemporaneous intraday open-to-close stock returns. To exploit predictability, we use the same regression analysis to fit order imbalances against future 1-day ahead returns. In contrast to contemporaneous results, statistically significant relations only appear in order imbalance of isolated trades. Despite the absence of significant regression coefficients, we observe that order imbalances of non-isolated trades arrive closely with trades for other stocks, appear to have negative relations with future returns. On the contrary, imbalances of trades arrive together with only trades of the same stocks show weakly positive correlations.

These associations are amplified in our subsequent portfolio analysis, as follows. We leverage these imbalances to build trading strategies. To assess the economic value of the trade flow decomposition method, we construct signal-sorted portfolios using COIs as signals. In particular, if we make long/short decisions in alignment with the observed patterns in the predictive regressions, we attain profits in all of our portfolios, with the highest annualized Sharpe ratio reaching 1.79. As a benchmark, we build portfolio investing in order imbalances without decomposition, for which the Sharpe ratio is negative.

The remainder of this paper is organized as follows. Section 2 outlines our contributions to the finance literature. In Section 3, we introduce the definitions of trade co-occurrence, trade flow decomposition and COIs. We start our empirical studies with describing data sources and conducting exploratory analysis in Section 4. Subsequently, we uncover the relations between COIs and contemporaneous returns in Section 5 and investigate the predictive power of COIs in Section 6, and economic value of COIs in Section 7. Section 8 provides robustness analysis and additional empirical findings. Finally, in Section 9, we summarize the results and discuss our limitations and future research directions.

2. Related literature

This paper contributes to four strands of literature. First, our study exploits a new financial application of co-occurrence analysis, which is a statistical method proven to be powerful in spatial pattern analysis and widely used in the fields of biology (Gotelli Citation2000, MacKenzie et al. Citation2004, Araújo et al. Citation2011), natural language processing (NLP) (Dagan et al. Citation1999, Kolesnikova Citation2016), computer vision (Galleguillos et al. Citation2008, Aaron et al. Citation2018) and others (Appel and Holden Citation1998, Ye et al. Citation2017). So far, the applications of co-occurrence analysis in finance literature concentrate on studying stocks co-occurring in news articles. Ma et al. (Citation2011) construct networks from company co-occurrence in online news and use machine learning models to identify competitor relationships between companies. Recent studies, including Guo et al. (Citation2017), Tang et al. (Citation2019), Wu et al. (Citation2019), build networks using stocks co-occurrence in news and employ them for tasks such as return predictions and portfolio allocation. We contribute by originating the idea of trade co-occurrence. By directly applying the co-occurrence of stock trades, we establish that this technique is beneficial for exploring and gaining insights from the financial market microstructure.

Second, our research adds to the studies of interactions among trading activities in the market. In Kyle (Citation1985)'s model, market makers observe the aggregated order flows of informed and liquidity traders in the market to adjust their trading strategies. More aggressively, HFT traders can detect informed traders, such as institutions (Van Kervel and Menkveld Citation2019) and predict trade flows of others (Hirschey Citation2021). Various theoretical models (Grossman and Miller Citation1988, Brunnermeier and Pedersen Citation2005, Yang and Zhu Citation2020)) are proposed for the interplay between high-frequency and institutional traders. Van Kervel and Menkveld (Citation2019) conduct an empirical study on the Swedish stock market and discover that HFT participants intend to trade against wind when the institutional traders begin splitting large orders, and eventually trading in the same direction as the institutions.

We contribute to this topic by proposing the idea of trade co-occurrence and provide empirical evidence that the co-occurrence of stock trades is not coincident. Rather than studying interaction among traders, we innovate trade co-occurrence as a tool to analyse interactivity at the individual trade level. Our study of COIs conditional on co-occurrence shows that the interactions of trades at a granular level convey useful information on price formation.

Third, this paper contributes to the literature of order imbalance and price formation. According to pioneering research, persistence in order imbalance can arise in two ways. First, as the model by Kyle (Citation1985) states, traders intend to split large orders over time to minimize their market impacts, which leads to autocorrelated imbalances. Another source for order imbalance, as Scharfstein and Stein (Citation1990) state, is the herd effect. To explore how order imbalance affects price changes, Chordia and Subrahmanyam (Citation2004) propose a theoretical model to explain the positive relation between order imbalance and contemporaneous stock returns, arising from the market makers dynamically accommodating order imbalance. In addition, discretionary traders optimally splitting orders across days enable order imbalance to have strong positive autocorrelation and predictive power on future returns. Their empirical study, using daily data of stocks listed on New York Stock Exchange (NYSE) for a 10-year period from 1988 to 1998, confirms their theoretical results and shows that order imbalances have significant forecasting power on future returns. However, there is controversy on the predictability. For example, Shenoy and Zhang (Citation2007) and Lee et al. (Citation2004) find no significant predictive power of order imbalances.

Although Chordia and Subrahmanyam (Citation2004) do not differentiate trade flows, subsequent studies have shown that marketable orders, placed at different time, by different agents, with distinct properties can have different impacts on price changes. Most evidence stems from the Chinese market (Lee et al. Citation2004, Bailey et al. Citation2009, Zhang et al. Citation2019), where private data of identification of trader types are available, and they find indications that order imbalances of institutional trade flows have higher pressure on prices than imbalances of individual traders. Same results are found in the US market by Cox (Citation2021)'s recent study of S&P 500 stocks during 2015–2016, which split trades into binary classes depending on whether or not they are inter-market sweeping orders, which are mainly adopted by institutions (Chakravarty et al. Citation2012).

Our research complements these works by supplementing the study of order imbalances in the US market using data of the most recent period and proposing a novel method to decompose the unconditional trade flows without requiring an additional private data set. We show that order imbalances, without differentiating trades, no longer have forecasting power on future returns, which is evidence for an evolution of the market microstructure over the past decades (Chordia et al. Citation2002, Chordia and Subrahmanyam Citation2004). However, trade flows decomposed with our proposed method carry different information content, and their COIs do possess forecasting power.

Finally, this paper adds to the literature of trading strategies based on order flow signals (Aldridge Citation2013). Traders can boost the profitability of their strategies by analysing the flow of orders in the market to improve their forecast signals and gaining insight from the strategies of their competitors (Foster and Viswanathan Citation1996, Hirschey Citation2021). Many previous studies have discovered that information derived from order flows, at a granular level, exhibits conspicuous predictive power on stock returns (Zhang et al. Citation2019, Cont et al. Citation2021, Aït-Sahalia et al. Citation2022, Lucchese et al. Citation2022) and can thus be leveraged for developing profitable trading strategies (Guilbaud and Pham Citation2013, Bechler and Ludkovski Citation2015, Kolm et al. Citation2021, Wang et al. Citation2021). Along the same lines, order imbalances derived from order flow have been widely used in developing trading strategies (Cartea et al. Citation2015). Chordia and Subrahmanyam (Citation2004) demonstrate the profitability of order imbalances as trading signals. Chang (Citation2012) uses order imbalances to enhance the performance of daily price momentum strategies and generates significant returns.

We contribute to this field by proposing a method to analyse trade flows based on the time proximity of trade arrivals and extract profitable trading signals from the aggregated trade flow. We leverage the derived COI-based signals to develop successful trading strategies and showcase their profitability with rigorous backtest and robustness checks.

3. Co-occurrence of trades and trade flows decomposition

3.1. Co-occurrence of trades

We first introduce the definition of trade co-occurrence. For each trade xa occurring at time ta, with a pre-specified δ, every trade, other than xa itself, that arrives within time period (taδ,ta+δ) is defined as having co-occurred with trade xa. We define the threshold δ as the neighbourhood size, and the set of all trades co-occurred with xa as δ-neighbourhood of trade xa, denote as Bδ(xa). Figure  sketches an example, where trade xa co-occurs with trades xb and xc, while it does not co-occur with trade xd. We note that co-occurrence is not an equivalence relation. It is perfectly possible for xa and xb to co-occur, and for xa and xc to co-occur, without xb and xc co-occurring.

Figure 1. Illustration of trade co-occurrence. This figure visualizes the idea of trade co-occurrence; given a user-defined neighbourhood size δ, trade xb arrives within the δ-neighbourhood of trade xa, and thus they co-occur. In contrast, trade xd locates outside xa's neighbourhood, and thus the two trades do not co-occur. Both trades xb and xc co-occur with trade xa, but they do not co-occur with each other.

Figure 1. Illustration of trade co-occurrence. This figure visualizes the idea of trade co-occurrence; given a user-defined neighbourhood size δ, trade xb arrives within the δ-neighbourhood of trade xa, and thus they co-occur. In contrast, trade xd locates outside xa's neighbourhood, and thus the two trades do not co-occur. Both trades xb and xc co-occur with trade xa, but they do not co-occur with each other.

3.2. Trade flow decomposition

Based on co-occurrence, we next split the trades of every given stock into different classes characterized by their δ-neighbourhood. We denote this procedure as trade flow decomposition.

3.2.1. Definition of the trade flow decomposition

We denote the set of trades of a given stock i as Xi. For a given universe of stocks, denoted as S, our goal is to assign labels to each trade xaXi for every stock iS.

To classify trades based on their time proximity with other trades in the market, we need to determine trades of which stocks other than stock i shall be incorporated. Thus we introduce a fixed set of stocks as a customized market index, denoted by MS, whose trades are also considered when labelling trades of stock i. Then we define the set of trades, M, as a representative of the market, referred to as the market set, that is M=jMXj.

Note that the stock iS, whose trades we aim to label, may or may not be in the market index M. Therefore, for each stock iS, we construct a reference set, Mi=MXi=jM:jiXj, which contains all trades of stocks other than stock i, in the market set. Finally, every trade xaXi is equipped with the set Bδ(xa) of trades in its neighbourhood.

With these sets, we formally define the trade flow decomposition by assigning each trade of stock i to one or two of five categories, with the protocol illustrated in figure . Initially, we partition all trades into two groups, isolated (iso) and non-isolated (nis) trades, defined as follows:

  1. isolated (iso): A trade, xaXi, is labelled as isolated if it does not co-occur with any other trade, that is Bδ(xa)XiMi=;

  2. non-isolated (nis): The trade is labelled as non-isolated if there are other trades of the same stock, or trades of other stocks in the market index, M, in its neighbourhood, that is |Bδ(xa)(XiMi)|1, where || denote the cardinality of a set.

Figure 2. Illustration of trade types, conditioning on co-occurrence. We showcase the distinct categorical labels of trade xa. Colour indicates the stock corresponding to a trade. Thus xb is for the same stock as xa, while xc is for a different stock. First line: xa is an isolated (iso) trade with empty δ-neighbourhood; second to fourth lines: xa is a non-isolated (nis) trade with nonempty δ-neighbourhood; second line: xa is a non-self-isolated (‘nis-s’) trade with only other trades for the same stock in its δ-neighbourhood; third line: xa is an non-cross-isolated (‘nis-c’) trade with only other trades for the different stocks in its δ-neighbourhood; last line: xa is a non-both-isolated (‘nis-b’) trade with both other trades for the same and different stocks in its δ-neighbourhood.

Figure 2. Illustration of trade types, conditioning on co-occurrence. We showcase the distinct categorical labels of trade xa. Colour indicates the stock corresponding to a trade. Thus xb is for the same stock as xa, while xc is for a different stock. First line: xa is an isolated (iso) trade with empty δ-neighbourhood; second to fourth lines: xa is a non-isolated (nis) trade with nonempty δ-neighbourhood; second line: xa is a non-self-isolated (‘nis-s’) trade with only other trades for the same stock in its δ-neighbourhood; third line: xa is an non-cross-isolated (‘nis-c’) trade with only other trades for the different stocks in its δ-neighbourhood; last line: xa is a non-both-isolated (‘nis-b’) trade with both other trades for the same and different stocks in its δ-neighbourhood.

We further decompose the non-isolated trades according to properties of the trades within their δ-neighbourhood. Each non-isolated trade xaXi can be classified into one of the following three categories:

(iii)

non-self-isolated (nis-s): the δ-neighbourhood of trade xa contains only trades (at least one) of the same stock as the one from trade xa, that is |Bδ(xa)Xi|1 and |Bδ(xa)Mi|=0;

(iv)

non-cross-isolated (nis-c): the δ-neighbourhood of trade xa contains only trades of stocks which are different than the stock corresponding to trade xa, that is |Bδ(xa)Xi|=0 and |Bδ(xa)Mi|1;

(v)

non-both-isolated (nis-b): the δ-neighbourhood of trade xi contains both at least one trade of the same stock, and at least one other trade of a different stock, that is |Bδ(xa)Xi|1 and |Bδ(xa)Mi|1.

These three classes form a partition of the set of non-isolated trades, as illustrated in figure . We refer to this process of separating trades into categories as trade flow decomposition.

3.2.2. Motivation and generating mechanism

The motivation behind our decomposition is to separate the trade flows of different types of market participants. Each trade flow should be dominated by certain types of traders and is thus expected to have distinct impact on stock returns.

  1. iso: Informed traders, for example financial institutions with access to sophisticated private alphas and infrastructure, tend to hide their trading purposes. When they are successful, their trades should neither follow nor be followed by other trades, thus becoming locally isolated. We expect this type of trade flow to exhibit significant price impact and to be consistent with long-term price changes.

  2. nis: Excluding informed trade flow, we expect this type of flow to have negative relationship with future price changes. However, the majority of the market participants should not have insider information, rendering this type of trade flow to have larger trading volume. Therefore, it should have considerable impact on contemporaneous stock returns.

  3. nis-s: HFT traders, who anticipate or identify the trades placed by the aforementioned informed traders, can front-run or prey on those trades. Therefore, these types of trades along with unsuccessfully hidden trades from informed trades are likely to co-occur. We thus expect this type of order flow to have the same direction of price pressure as the iso flow, but with less impact and consistency.

  4. nis-c: Traders who run market neutral strategies or trade baskets of stocks will rebalance when their positions in other stocks change or trade multiple stocks simultaneously. This type of trade flow should capture most of this rebalancing (e.g. updating positions of index constituents in index arbitrage strategies). We expect this mass rebalancing behaviour to exhibit both permanent and transient impacts, and to lead to price mean reversion in the next period.

  5. nis-b: When the market intensity suddenly rises, for example, due to release of news concerning macroeconomic events or increased trading activity around market opening and closing sessions, trading volumes increase across all stocks, leading to the arrival of such types of trades. Potential overreaction to such news events could result in mean reversion of future prices.

In addition, we assume there exists noise traders who are likely to be classified in any of these trade flow categories and who choose their trading direction randomly. To assess the overall price impact, we calculate the order imbalance of each type to obtain its net price pressure; see details described in the following section. To closely examine the mechanism, we perform an analysis on the relationship between order imbalances of decomposed trade flows (COIs), across both contemporaneous and future stock returns. Therefore, our hypotheses are twofold:

  1. all types of COIs have significant positive relation with contemporaneous returns;

  2. order imbalances of iso and nis-s trade flows are positively related with future returns, while COIs of the other types are negatively correlated with future returns.

It is important to clarify that without client order ID data and information on the type of strategy from which the individual orders originate, it is challenging to identify and establish the generating mechanisms behind each type of decomposed order flow.

3.3. Conditional order imbalance

With the decomposition of trade flows, we proceed to study the price impact of trades with different characteristics. A bridge connecting trading activities and price changes is given by the order imbalance quantity, defined as the normalized difference between the volume of buyer- and seller-initiated trades (Chordia and Subrahmanyam Citation2004). For a given stock i, we derive conditional daily order imbalances, as follows: (1) COIi,ttype=Ni,ttype,buyNi,ttype,sellNi,ttype,buy+Ni,ttype,sell,(1) where Nibuy,type and Nisell,type denote the total number of market buy orders and market sell orders of stock i in day t respectively. If the denominator is 0, which happens when there are no trades of a certain type, we define the COI in this case to be 0. We consider six types of COIs and the superscript type, which takes a value in {all, iso, nis, nis-s, nis-c, nis-b}, indicates the group of trades used to calculate the imbalance. Note that the ‘all’ label corresponds to using the entire universe of trades without decomposing based on trade co-occurrence. Thus the ‘all’ COI is the same as order imbalance in the number of transactions, scaled by total transactions, studied by Chordia and Subrahmanyam (Citation2004).

4. Empirical selection of δ, existence of co-occurrence and exploratory data analysis

In this section, we propose an empirical approach for choosing the parameter δ and we showcase the existence of co-occurring trades in the market. We start with a brief description of the data employed in our study. We then provide empirical evidence that setting δ=1 ms is an appropriate choice. For further details of different values of δ, we refer the reader to Appendix 5. Moreover, we uncover salient patterns of trade co-occurrence through exploratory analysis. Furthermore, we show that the resulting order imbalances of the decomposed trade flows are only weakly correlated with each other, which indicates that the trade decomposition we propose is meaningful.

4.1. Data source and preprocessing

Our study is based on 457 US stocks during the period from 2017-01-03 to 2020-12-31. The selected stocks are those companies included in Standard & Poor's (S&P) 500 index for which both order book data and price data is available over the entire sample period. Table  provides of brief summary of the stocks.

4.1.1. Limit order book data

We obtain limit order book data from the LOBSTER database (Huang and Polak Citation2011), which provides detailed records of limit orders for all stocks traded in the NASDAQ exchange. The records include limit order submissions, cancellations and executed trades, indexed by time with precision up to nanoseconds. For each stock on each trading day, a record contains the time stamp, event type (submissions/cancellations/executions), direction (buy/sell), size and price for a limit order event. By filtering for limit order executions and reversing their directions, we infer the buyer- and seller-initiated trades, e.g. execution of a limit buy order implies placement of a market sell order/marketable limit sell order. Noting that a large market order simultaneously consumes multiple existing limit orders, we merge inferred trades with identical timestamps. Given LOBSTER's high time resolution, we assume different trades cannot have exactly the same timestamps.

4.1.2. Prices and returns

We acquire daily price data for our stock universe under consideration, from the Center for Research in Security Prices (CRSP) database, and calculate daily open-to-close logarithmic returns as (2) Ri,t=logPi,tClosePi,tOpen,(2) where Pi,tOpen and Pi,tClose are daily open and close prices of stock i on day t. To alleviate the effect of the market component, we also consider market excess returns in this study, denoted as ri,t, calculated as follows: (3) ri,t=Ri,tRSPY,t,(3) where RSPY,t is the daily return of SPY ETF, which tracks the S&P 500 index. For simplicity, here we assume all stocks have the same market beta equal to 1.

In addition, we collect factor data from Kenneth R. French's online Data Library.Footnote1 These include daily returns of the market factor (MKT), size factor (SMB), value factor (HML), profitability factor (RMW), investment factor (CMA) (Fama and French Citation1992Citation1993Citation2015) and momentum factor (MOM) (Jegadeesh and Titman Citation1993, Carhart Citation1997).

4.2. Universe of stocks and the representative of the market

In our empirical study, we classify trades of stocks in a universe comprising of 457 constituents of the S&P 500 index. For simplicity, we also use the set of all trades of the same 457 stocks as representative of the market. According to the definition in Section 3.2.1, that is S=M. Therefore, the reference set of each stock i, Mi=MXi, consists of trades of the other 456 stocks. We are aware that the labels of trades can depend on the market set and discuss the selection of market indices in Section 8.2.

4.3. Null model: co-occurrence probabilities under complete randomness

With order book data, we first answer the following fundamental questions. Do trades really co-occur or are their arrivals simply random and independent of each other? Does our trade flows decomposition capture a signal? In this section, we develop a null model under the assumption of completely random order arrival.

We assume that, for stock i, the arrivals of trades within a time interval of length T follow independent Poisson processes with the same intensity λT. Let Ni denote the number of trades of stock i in [0,T]. Conditional on Ni=ni, the arrival time of the ni trades is independent and follows a uniform distribution on [0,T]. Hence, for each trade, the probability that another trade falls in its δ-neighbourhood during the time period T is (4) p=2δT.(4) Next, we derive the probabilities of different types of trade flows, as follows: (5) Piδ(iso)=(1p)(Ni+Ni1),Piδ(nis)=1(1p)(Ni+Ni1),Piδ(niss)=[1(1p)Ni1](1p)Ni,Piδ(nisc)=(1p)Ni1[1(1p)Ni],Piδ(nisb)=[1(1p)Ni1][1(1p)Ni],(5) where Ni denotes the number of trades for all stocks in the market other than stock i. In particular, for each stock i in our sample universe of 457 stocks, Ni is the total number of trades of the remaining 456 stocks.

4.4. Choice of neighbourhood size δ

The definition of trade co-occurrence and classification of individual trades depend on the choice of the neighbourhood size δ. When considering the extreme case of δ=0, all trades are isolated. As we progressively increase δ, an isolated trade turns into one sub-type of non-isolated trades. Meanwhile, both non-self-isolated and non-cross-isolated trades can only become non-both-isolated. Eventually, when δ is large enough, all trades are non-isolated; to be specific, they all become non-both-isolated. Hence, with the value δ increasing, the number of isolated trades decreases and the numbers of non-isolated and non-both-isolated trades increase monotonically. Thus the quantities of non-self-isolated and non-cross-isolated trades initially increase; after reaching their respective maximum, they begin to decrease. We are aware that the choice of δ may depend on the specific task at hand, and the optimal value can vary; however, we propose a simple approach to select δ for the following empirical study in this paper. The intuition is straightforward; we choose a δ which maximizes the average distance, weighted by the empirical percentage of each type of trades, between null probabilities and empirical proportions. For simplicity, the same value of δ is shared across all stocks. We report the resulting average distance in table .

Table 1. Difference between null and empirical probability.

The first step is to derive the probabilities for each stock under complete randomness. As the intraday intensities are not constant, we thus calculate the probabilities for every 5 min (T=5 min), which leads to 78 intervals, and consider their averages (weighted by the intensities), as the final daily probabilities. We then also compute the empirical probabilities. We search over eight values of neighbourhood size, δ{0.05ms,0.075ms,0.125ms,0.25ms,0.5ms,1ms,5ms,50ms}, and plot their intraday null and empirical probabilities in figure . Table  shows the average distance for the candidate δ s. The maximum distance of 0.14 is achieved at δ=1 ms.

Figure 3. Co-occurrence probability: null versus empirical. The table plots the null and empirical probabilities of each type of trades, over 5 minute intervals, averaged over stocks and days, for selected values of δs.

Figure 3. Co-occurrence probability: null versus empirical. The table plots the null and empirical probabilities of each type of trades, over 5 minute intervals, averaged over stocks and days, for selected values of δs.

4.5. Existence of co-occurrence

By comparing the theoretical co-occurrence probabilities (Donges et al. Citation2016) under the null model and the empirical values derived from data, we confirm the existence of co-occurrence among stock trades at the level of 1 ms, supporting the idea that the overall trading volume has a strong cross-asset interaction component. From an economic perspective, this is perhaps to be expected, given the large presence in current markets of index-arbitrage traders who simultaneously trade an index ETF against a basket of constituents. ¡tab2/¿

Table  shows the null and empirical daily probabilities averaged over time and stocks. Given the very small neighbourhood size (δ=1 ms), 81.28% of trades should be isolated if there is no co-occurrence. However, there are only 28.55% isolated trades in the market. In conclusion, there is empirical evidence that the notion of trade co-occurrence captures a latent signal. This serves as motivation to further decompose trade flows and study them individually.

Table 2. Null and empirical probability of each type of trade flows.

4.6. Summary statistics of trades

After building our data set of trades, we label every trade with its corresponding type. Figure  illustrates the intraday distributions of different types of trades. A summary of the data is presented in table ; the chosen neighbourhood size for co-occurrence is 1 ms (δ=1 ms). The table shows descriptive statistics of the raw data, where each number is calculated by averaging daily time series and then considering the cross-sectional mean, median or standard deviation over all stocks. On average, isolated trades account for 28.55% of the total number of transactions, while the majority of trades are non-isolated in one of the three defined types (nis-s, nis-c, or nis-b). Approximately half of the non-isolated trades, 29.75% of all trades, are non-self-isolated. The mean proportions of non-cross-isolated and non-both-isolated trades are 17.27% and 24.43%, respectively. The large standard deviation for the number of trades could be seen as an indication that the population is heterogeneous. The percentages of different groups of trades in terms of volumes, which are very similar to those reported in table . With this in mind, it is reasonable to concentrate on the count of trades as a liquidity measure.

Figure 4. Intraday distributions of the number of each type of trades. We calculate numbers of each types of trades in percentage of the total number of trades, for non-overlapping 5-min intervals during normal trading hours from 9:30 to 16:00 for all stock over the period from 2017-01-03 to 2020-12-31. This figure plots intraday 5-min counts of different types of trades, averaged over both time series and cross-section.

Figure 4. Intraday distributions of the number of each type of trades. We calculate numbers of each types of trades in percentage of the total number of trades, for non-overlapping 5-min intervals during normal trading hours from 9:30 to 16:00 for all stock over the period from 2017-01-03 to 2020-12-31. This figure plots intraday 5-min counts of different types of trades, averaged over both time series and cross-section.

Table 3. Summary statistics for all groups of trades.

Highlighting the empirical fact that the trading activity is higher at the start and end of a trading day, figure plots the intraday distributions of trades, revealing slightly different temporal behaviours of different trade types. The plot exhibits the number of each type of trades over every half hour, with the y-axis indicating percentages of the total number of trades. We observe that all types of trades increase drastically in the last half an hour. It is noteworthy that, after the decomposition, the flow of isolated trades is smoother than the flow of non-isolated-trades, with a lower slope for the last-half-hour climb. By further separating the sub-types of non-isolated trades, we find that non-self-isolated trades contribute more at the start of a day, while the line of other two types are flat except at the end of days.

4.7. Descriptive statistics of order imbalances

With trades labelled according to their co-occurrence types, we compute daily order imbalances and report descriptive statistics in table . Panel A documents summary statistics of each category of order imbalance, averaged over time and stocks. Overall, the average unconditional order imbalances are negative. After the decomposition, the isolated and non-self-isolated order imbalances tend to be negative, with both higher means and variances compared to their unconditional counterparts. In contrast, the means of non-cross-isolated and non-both-isolated imbalances are positive, but with even higher variance. Hence, our study essentially constructs features with different behaviours by conditioning on the co-occurrence of trades. However, the standard deviations are much larger than the means, so statistically, the means are not significantly different from zero. Hence the means can only be taken as a very weak indication of a potential signal.

Table 4. Summary statistics for all groups of trades and order imbalances.

Panel B presents average partial autocorrelations of each type of order imbalance. It can be seen that all the order imbalances are positively autocorrelated. The lag 1 autocorrelations for COIs are substantial. Among the conditional imbalances, the non-cross-isolated order imbalance, corresponding to trades that closely co-occur with trades of other stocks in the market, has relatively higher autocorrelation. In contrast, the autocorrelation for the order imbalance from non-self-isolated trades is comparatively lower. These partial autocorrelations decay drastically with increasing lags.

Figure  shows the Pearson correlations, averaged over all stocks, of COIs, with δ=1 ms. All types of order imbalances are positively correlated with each other while the strengths are different and can be fairly low. An exception is the unconditional order imbalance, which is strongly associated with every other type. The correlations between isolated imbalance and non-isolated imbalance, as well as its sub-types, are low.

Figure 5. Pearson correlation of order imbalances. For each type of order imbalance, we first consider the vector of daily values during 2017-01-03 to 2020-12-31, then compute the correlation matrix and finally average across all stocks.

Figure 5. Pearson correlation of order imbalances. For each type of order imbalance, we first consider the vector of daily values during 2017-01-03 to 2020-12-31, then compute the correlation matrix and finally average across all stocks.

As expected, conditioning on isolation and non-isolation produces distinct features. Furthermore, the three order imbalances obtained by decomposing non-isolated trades are also strongly correlated with the aggregated non-isolated order imbalance, but weakly correlated with each other. Upon exploring their relations in more detail, we find that the non-self-isolated order imbalances derived from orders which are not co-traded with other stocks in the market, are relatively more correlated with isolated order imbalances. In contrast, the order imbalances of non-cross-isolated and non-both-isolated trades, which are more connected with the market, are less correlated with the isolated and non-self-isolated order imbalances. Therefore, we are confident that the decomposed order imbalances are distinguishable features, with all pairwise correlations smaller than 0.6, that they can reveal insights about structural properties of the equity market which cannot otherwise be inferred by looking at the aggregated order flow.

5. Contemporaneous price impact of conditional order imbalances

To assess the contemporaneous effects of each type of order imbalance on contemporaneous returns, we employ the following panel regression: (6) ri,t=α+ρTypesβρCOIi,tρ+βσσi,t+βvvoli,t+βbMKTt+βsSMBt+βhHMLt+βrRMWt+βcCMAt+βmMONt+ϵi,t,(6) where ri,t is the return of stock i at time t; βρ is the coefficient for each dependent variable; Types is a set indicating types of COIs included in the regression. In addition to COIs, we control for explanatory variables, including daily realized volatility σi,t and dollar volume, voli,t, together with six factors. The residual terms, ϵi,t, are assumed to be mean zero normally distributed and the random variables {ϵi,t} are assumed to be independent. For inference, we apply two-tailed t-tests on the regression coefficients, βρ, of COIs.

In table , we report results of the contemporaneous regressions against each type of COIs one at a time. Consistent with previous research, the unconditional order imbalances are positively and significantly related to returns, for almost all stocks. Furthermore, our conditional order imbalances (COIs) also express significantly positive influence on the same-day contemporaneous returns, especially isolated COI. It is noteworthy that impacts of the three types (nis-s, nis-c and nis-b) of order imbalances derived from decomposing non-isolated trades have comparatively weaker influences with respect to their values of coefficients and t-scores.

Table 5. Contemporaneous regression against individual COIs.

Focusing on the percentage of variance explained, by comparing with the model that only uses control variables in the regression, which has an adjusted R2 of 2.63%, all types of order imbalances exhibit additional explanatory power on price impact. Furthermore, we find that the regression with ‘iso’ COI generates the highest adjusted R2 of 5.91%. After decomposing trade flows, the ‘iso’ COI, although calculated with only 28.59% of trades, explains a comparable amount of variance as unconditional order imbalance. Regressing returns against ‘nis’ COIs achieves a lower R2 than regressing against ‘all’ or ‘iso’ COIs. Hence, the price impact is not proportional to the quantity but appears to be driven by the types of trades. It indicates that price pressures generated by trades with distinct co-occurrence relations with other trades in the market are inhomogeneous and warrant studying separately.

In addition to the significant effect of individual conditional order imbalances on returns, we are also interested in the extra information gained from decomposing aggregated order imbalances. To this end, we fit regressions with multiple types of COIs and report the results in table . Each regression takes as input a group of COIs as indicated in the first column. We draw inference on the coefficients. Taking the influence of feature numbers into account, we also use the adjusted R2 as an evaluation metric.

Table 6. Contemporaneous regression against multiple COIs.

From table , we observe evident improvements in the adjusted R2 when taking multiple trade types into account. Using the unconditional order imbalance as benchmark, splitting market orders into isolated and non-isolated explains 0.53% more of the total variance, which is a 9.36% increase from the benchmark adjusted R2 of 5.66%. To examine the contribution of further decomposition of non-isolated trades, we add the sub-types (nis-c, nis-b and nis-s) to iso COI one at a time following the descending order of R2 in table , and nis COI at the end. As the adjusted R2 increases, we conclude that all types of COIs of decomposed trade flows contain distinct impact on stock returns. Finally, according to the regression in the last row, in the presence of decomposed COIs, the undecomposed order imbalance is not significant for explaining price impact. In conclusion, we successfully separate trades with different contemporaneous price impact from the entire trade flow, and the decomposition helps explain contemporaneous daily price changes.

In conjunction with panel regression, we also perform time series regressions for individual stocks and present the results in Appendix 3. Table  suggests that the significant and positive relationships between contemporaneous returns and each type of COIs are consistent over the majority of stocks; the distributions of coefficients are illustrated in figure . Additionally, figure  sketches the density of the adjusted R2 across all stocks; the distributions for all types of COIs are positively skewed and have mean above 10%.

To investigate the temporal consistency, we replicate the aforementioned panel regression analysis on a yearly basis from 2017 to 2020, and report in tables  and in Appendix 2. The above conclusion remains true as above. During 2020, when the COVID-19 changed the market environment, iso COI still has the most significant price impact, while the contemporaneous returns are less sensitive to nis-c and nis-b COIs. We note that there is a decreasing trend in adjusted R2 over the years, as well as an improvement from the baseline of using only the control variables.

6. Predictive power of imbalances on future returns

In conjunction with contemporaneous effects of order imbalances, it is also important to study their forecasting power. In this section, we show that iso and nis-s order imbalances are positively related to future returns, while nis, nis-c and nis-b COIs are negatively correlated with future returns. Moreover, we discover that decomposing trade flows and simultaneously using multiple COIs contain signals for forecasting next-day returns. We provide evidence, using both regression and portfolio sorting approaches.

6.1. Predictive regression

To examine the contribution of the trade flow decomposition to return forecasting, we perform the same regression analysis procedures as in the previous section. More precisely, to explore the connection between COIs and 1-day ahead market-excess returns, we perform panel regression on future returns, ri,t+1, against current COIs while controlling for current returns ri,t as well as explanatory variables in equation (Equation6), under the model (7) ri,t+1=α+ρTypesβρCOIi,tρ+βτri,t+βσσi,t+βvvoli,t+βbMKTt+βsSMBt+βhHMLt+βrRMWt+βcCMAt+βmMONt+ϵi,t+1,(7) where βρ is the coefficient for each dependent variable; Types is a set indicating types of COIs included in the regression; and ϵi,t are the residual terms which are assumed to be independent and identically distributed with mean zero normal distributions.

Table  documents the regression results. As expected, unlike contemporaneous impact, both the magnitudes and percentages of significant coefficients are low, with the coefficient for unconditional order imbalances being approximately equal to zero. Over our study period, we do not find evidence to support the theoretical model put forth by Chordia and Subrahmanyam (Citation2004), which would yield a significant positive relationship between imbalances and one-day ahead returns, in the absence of future order imbalance. However, with our decomposition of trades into categories, we can strengthen the above signals. Our findings suggest that the price pressures which arose from isolated and non-self-isolated order executions show moderate predictive power. Additionally, non-isolated (nis), non-cross-isolated and non-both-isolated trade imbalances are negatively associated with future price changes. Especially, iso and nis-b COIs exhibit significant predictive power on future returns. In term of adjusted R2, all COIs of the decomposed trade flows outperform the COI of the undecomposed (i.e. aggregated) trade flow. Additionally, the adjusted R2 of regressing on only control variables is higher than incorporating unconditional order imbalance, but lower than including any types of COIs. This finding underscores the importance of decomposing trade flows when forecasting returns.

Table 7. Predictive regression against individual COIs.

In the next step, we regress future 1-day stock returns against different groups of COIs, as indicated in the first column of table . It is noteworthy that iso COI shows significant predictive power in every regression setting. Although the other types of decomposed COIs do not show significance when the goal is to predict noisy daily returns, the signs of their coefficients are consistent. In addition, as the adjusted R2 grows, we find that, with the exception of nis-s COI, all types of decompositions contribute to the return prediction task. Therefore, we conclude that the order imbalances conditioning on co-occurrence are valuable predictors for short-term return forecasting. We thus conclude that decomposing trade flows according to such COIs improves predicting future returns.

Table 8. Predictive regression against multiple COIs.

In addition to panel regression, we conduct time series predictive regressions for individual stocks and explain details in Appendix 3. Table  shows that, for most of the stocks, the signs of coefficients of different types of COIs are in accord with our findings. We depict the distributions of coefficients in figure . Moreover, figure  illustrates the right-skewed distributions of adjusted R2 across all stocks, corresponding to COI types.

To reinforce our findings, we perform the panel regression analysis on a yearly basis from 2017 to 2020, and report the results in tables  and in Appendix 2. The signs of the relationships between future returns and different types of COIs are constant for almost all of the subperiods. Furthermore, the significance of iso COI is persistent across all time periods, with the exception of 2020, which was an unusual market environment due to COVID-19. In contrast to the contemporaneous price impact, the adjusted R2 increase from 2017 to 2018. In addition, the adjusted R2 peaks in 2020, indicating that during the tumultuous period, the market is less efficient and it takes longer for the stocks to absorb the price pressure. We conclude that the inferred COIs exhibit forecasting power.

6.2. Imbalance-based portfolio sorting

To bolster our findings on the positive and negative relations between future returns and different types of COI, we apply the portfolio sorting methods (Fama and French Citation1993, Cattaneo et al. Citation2020) to translate order imbalances into portfolios. For each type of COI, we sort stocks according to their imbalance values, from low to high, into five quintile portfolios. Taking multiple features into account, we further create 5 × 5 double-sort portfolios, for every pair of COIs. The imbalance-sorted portfolios are equally weighted and have only long positions on stocks, with daily portfolio returns calculated as the average returns of all stocks in them. Backtests of imbalance-sorted portfolios, over the entire sample period from 2017-01-03 to 2020-12-31, reinforce the finding that iso and nis-s imbalances are momentum signals, while the nis, nis-c and nis-b imbalances are reversal signals, and that they have different influence on future returns.

6.2.1. Single-sort portfolios

Panel A of table  documents the annualized returns of single-sort portfolios. We note, in the first row, that the returns of the unconditional-imbalance-sorted portfolios are negative and fluctuate along quintiles, which confirm the absence of clear linear relations between unconditional order imbalance and future return. However, after performing the decomposition, we find that the growth in returns with increasing iso order imbalance is almost monotonic, despite a slight drop in the second quintile, which reinforces its positive correlation with future returns. There is also a slightly increasing trend for nis-s, which is a sign of weak positive correlation. In contrast, we observe declines in average returns along other types of COIs, which echos our time series regression results and confirms negative correlations, altogether providing evidence for the proposed decomposition.

Table 9. Summary of single-sort portfolios.

Panel B shows daily COIs averaged over stocks in each portfolio. The COIs are signed, denoting that ‘Low’ and ‘High’ portfolios correspond to strong signals with opposite signs. We observe that the distributions of all signal strengths are roughly symmetric and centered around 0. In each row, there are no quintile portfolios consisting of stocks with indistinguishable average COI values. However, the portfolio returns are neither symmetric nor monotonic along quintiles (except iso). By comparing returns in each row of Panel A, we observe that the magnitudes of the most positive returns are always smaller than the absolute values of the most negative returns. Therefore, we conjecture that the positive and negative impacts of COIs on future returns are asymmetric, with negative impacts on future returns being more influential.

Furthermore, for the negative impacts, the highest magnitudes in COIs do not lead to the largest next day decreases. For example, the ‘Low’ and second quintile portfolios of iso COI have similar returns and, for portfolios of nis-c COI, the fourth quintile reaches the lowest average return of 5.69%, while the return of the highest quintile rise to 3.64%. As interpretation of this phenomenon we propose that extreme imbalances can lead to strong reversal on the following day, because some investors aim to maintain stable levels of risk exposures.

6.2.2. Double-sort portfolios

To future investigate the interplay between COIs, we build portfolios by independently double-sorting on every pair of imbalances of decomposed trade flows. Table  presents the annualized returns of all portfolios, where each block contains 25 portfolios by sorting on a pair of signals indicated by row and column names.

Table 10. Annualized returns of double-sort portfolios.

In each column of the iso–nis-c block, the average returns rise from low to high COIs of isolated trades. In contrast, controlled with iso COI, the returns typically fall from low to high non-cross-isolated COI. Double-sorting on the strongest signals generates the highest and lowest returns, on the upper-right and bottom-left corners of the block. The magnitudes of the strongest returns, 18.20% and 16.62%, are also amplified compared with sorting on one single signal. The same patterns and improvements appear when double-sorting on every pair of momentum and reversal COI features with iso COI. However, the patterns for other pairs are not obvious. For example, when considering the blocks of iso–nis-s sorts, we do not observe any monotonic patterns along rows and columns. We conclude that iso COI and the reversal signals carry distinct information and incorporating them simultaneously boosts predictive performance.

7. Economic value of conditional order imbalances

As discussed in previous sections, there is evidence that conditional order imbalances contain signals for explaining and forecasting individual stock returns. In this section, we exploit their economic values by forming long-short portfolios using sorts. Our imbalance-based trading strategies generate conspicuous profits and significant abnormal returns. High trading profits also provide important evidence of the predictive power which the COIs of the decomposed trade flows possess.

7.1. Long-short portfolio construction and evaluation

We design practical trading strategies based upon imbalance-sorted quintile portfolios. At 9:30 am of each trading day, we buy the first (resp., last) and short sell the last (resp., first) quintile portfolios for momentum (resp., reversal) signals with the same amount such that they are self-rebalancing. Every day, we close all position at 16:00 pm to avoid overnight effects. Overall, the daily returns are the differences between the returns of the long and short imbalance-sorted portfolios.

To evaluate profitability, we compare the annualized returns of the portfolios, as well as the annualized Sharpe ratio (Sharpe Citation1994), defined as (8) SRp:=mean(Rp,t)Rfstd(Rp,t)×252,(8) where Rp,t are daily returns of the portfolios and Rf is the average daily risk-free rate, which equals 0.00625% during the period of interest.

7.2. Profitability analysis

We construct long-short portfolios and report their profitability measures in table . Panel A displays the annualized returns, with on- and off-diagonal values for single- and double-sort portfolios based on COIs indicated by row and column names. We find that incorporating multiple COIs improves the profit of portfolios, which is supporting evidence that the trade flow decomposition technique creates profitable COI signals. For example, the return of the long-short strategy corresponding to iso−nis double-sort is 23.33%, which is 16.57% and 18.93% higher than simply sorting on iso and nis COI separately. The highest annualized return hits 34.87% by double-sorting on iso and nis-c COIs. The Sharpe ratios in Panel B strengthen our findings on the economic value of COI signals. Adjusted for volatility, our trading strategies remain profitable, and double-sorting outperforms trading on signals individually. The portfolio sorted on iso and nis-c achieves the highest Sharpe ratio of 1.79, followed by 1.74 of the iso−nis-b sorted portfolio. Therefore, we find evidence that for investors, it is economically beneficial to incorporate multiple types of COIs when making trading decisions based on trade flow data.

Table 11. Profitability of long-short portfolios.

From the perspective of asset pricing, COIs are unique and significant sources of abnormal returns. To adjust for risk, we regress the excess returns of the long-short portfolios against the factors and show their alphas in table . The factors are MKT, SMB, HML, RMW, CMA and MON. Additionally, we construct a daily rebalanced zero investment portfolio as an extra momentum factor (UMD), by sorting the returns of previous days in our universe of stocks and then longing the top half while shorting the bottom half. All the portfolios based on iso COI, except iso-nis-s, generate statistically significant abnormal returns, providing evidence that the profits cannot be explained by common risk factors. In addition, the nis-b COI-based single-sorted portfolio achieves significant abnormal return as well.

Table 12. Abnormal returns of long-short portfolios.

We also compare our strategies with five benchmark portfolios. First, we construct a long-short portfolio of unconditional order imbalance, denoted as ‘all’, to assess the economic value of trade flow decomposition. Second, we build a return momentum benchmark portfolio constructed in the same way as COI-based long-short portfolios, but with yesterdays' market excess returns as signals. Because COIs and contemporaneous returns are significantly correlated, it is necessary to show that the profitability is not fully revealed by prices. Third, we build an equally weighted portfolio with the market excess returns of the 457 stocks in our sample universe. Finally, we choose SPY, considering both open-to-close and close-to-close returns, as tradable market portfolios to benchmark against overall market performance. From the COI-based strategies, we select the single-sort portfolio of iso COI, and the double-sort portfolios of iso/nis and iso/nis-c COIs as representatives. Figure  visualizes cumulative returns of selected COI-based long-short portfolios and benchmarks. Over the test period, we observe that using COIs of the decomposed trade flows attains conspicuous profits. In comparison, the long-short portfolio based on undecomposed order imbalance and SPY have lower annualized returns, 0.31% and 10.04%, and Sharpe ratios, −0.25 and 0.42 respectively. The iso single-sort portfolio has a similar annualized return as the SPY ETF, but with much lower volatility while attaining a Sharpe ratio of 1.29. In contrast, the other three benchmark portfolios, return momentum, equally weighted and SPY ETF (open-to-close), lose money over the backtest period. Clearly, the double-sort portfolios surpasses all other portfolios with superior returns and Sharpe ratios.

Figure 6. Cumulative returns of portfolios. This figure plots cumulative returns of five portfolios from 2017-01-03 to 2020-12-31. The portfolios include (1) ‘iso’: the long-short portfolio single-sorted on iso COI; (2) ‘iso/nis’: the long-short portfolio double-sorted on iso and nis COIs; (3) ‘iso/nis-c’: the long-short portfolio double-sorted on iso and nis-c COIs; (4) ‘all’: the long-short portfolio single-sorted on COI of undecomposed trade flows; (5) ‘return momentum’: the long-short portfolio single-sorted on previous day's returns; (6) ‘equal weight’: equally-weighted portfolio of the selected 457 stocks; (7) ‘SPY ETF (open to close)’: cumulative open to close returns of the SPDR S&P 500 ETF Trust which tracks the S&P 500 Index; (8) ‘SPY ETF’: cumulative close to close returns of the SPDR S&P 500 ETF.

Figure 6. Cumulative returns of portfolios. This figure plots cumulative returns of five portfolios from 2017-01-03 to 2020-12-31. The portfolios include (1) ‘iso’: the long-short portfolio single-sorted on iso COI; (2) ‘iso/nis’: the long-short portfolio double-sorted on iso and nis COIs; (3) ‘iso/nis-c’: the long-short portfolio double-sorted on iso and nis-c COIs; (4) ‘all’: the long-short portfolio single-sorted on COI of undecomposed trade flows; (5) ‘return momentum’: the long-short portfolio single-sorted on previous day's returns; (6) ‘equal weight’: equally-weighted portfolio of the selected 457 stocks; (7) ‘SPY ETF (open to close)’: cumulative open to close returns of the SPDR S&P 500 ETF Trust which tracks the S&P 500 Index; (8) ‘SPY ETF’: cumulative close to close returns of the SPDR S&P 500 ETF.

Furthermore, in table , we compare the selected portfolios abnormal returns and their relationship with other risk factors (Hirshleifer and Jiang Citation2010, Chang et al. Citation2013). In contrast to COI-based portfolios, none of the benchmarks exhibit significant and positive abnormal returns after adjusting for the factors. In terms of factor exposures, the iso and iso/nis-c portfolios have significant exposure to SMB and UMD factors, while iso/nis-c portfolio has significant loadings to MKT, HML and CMA factors. However, there is a large proportion of returns, for COI-based portfolios, that cannot be explained by these factors. Although the portfolios are regressed against the same set of factors, the variation explained (R2) of the COI-based portfolios ranges from 3.16% to 6.24%, attaining much lower values compared to the baseline portfolios. To be specific, the portfolio sorted on undecomposed order imbalance significantly exposes to Fama French 5 factors and has an R2 of 13.35%. Moreover, the other baseline portfolios can all be significantly explained by risk factors, with R2 ranging from 55.64% to 99.52%.

Table 13. Abnormal returns of long-short portfolios.

8. Robustness analysis

In this section, we briefly comment on the robustness of the identification of trade co-occurrences and the construction of conditional order imbalances. Further details are provided in the Appendix.

8.1. Neighbourhood size effect

We replicate our analysis for eight values of δ's in Appendix 5. The patterns in contemporaneous impact and predictive power are robust for small neighbourhood sizes. Nevertheless, when δ reaches 50 ms, the performance of trade co-occurrence as a filter drops. In addition, we achieve the best results of different types of COIs at different δ values, hinting at the potential benefit of the approach to combine signals derived from multiple values of δ.

8.2. Representative of the market effect

The classification of trades may depend on the choice of the representatives of the market index M. As a thought experiment, assume that there is a trade of Apple Inc. (AAPL) which has only trades of Alphabet Inc. (GOOGL) in its δ-neighbourhood. Then if we replace S&P 500 stocks with constituents of the Dow Jones Industrial Average (DOW 30) index, which include AAPL but not GOOGL, as representatives of the market index, the category of this AAPL trade will change from nis-c to iso. Therefore, to assess the effect of market index size we test the robustness of the trade flow decomposition by using S&P 100 and DOW 30 respectively, and carry out a comparative study. We report the details in Appendix 6.

When the set of stocks adopted as the market index, M, varies, the fractions of each type of trades change slightly. For each type of decomposed trade flows, the COIs calculated based on different indices are highly correlated. We construct long-short portfolios corresponding to different types of COIs and universes of stocks. The portfolio double-sorted on iso and nis-c COIs based on S&P 500 achieves the highest Sharpe ratios. The general results for our universe also hold when using S&P 100 constituents for classification. However, when using Dow 30, a much smaller universe, the forecasting power deteriorates and some portfolios become nonprofitable.

8.3. Time-of-day effect

Trading activities during different intraday periods have different impact on prices. As we note, trading activities are more intensive during the first and last half hours of each trading day. Some recent works, such as Cont et al. (Citation2021), exclude these volatile periods when they calculate imbalances for robustness, while others (Chu and Qiu Citation2021) pay special attention to imbalances during these half-hour intervals. Taking this time-of-day effect into account, we study COIs within three time intervals, namely 9:30–10:00, 10:00–15:30 and 15.30–16:00 separately, and document our findings in Appendix 7.

Our findings on contemporaneous return–imbalance relations hold for every period. Additionally, we find that the predictive power of the decomposed trade flows originates from different time periods. The iso and nis-s COIs of the last hour contribute to forecasting future returns. On the other hand, the nis-c COI's forecasting power stems from periods other than the last half an hour. Moreover, for the nis-b trades, only the COIs pertaining to 10:00−15:30 help anticipate the next-day open-to-close market excess returns.

8.4. COI measured by volumes

Apart from incorporating the number of transactions, it is also common to define order imbalance as the normalized difference between volumes of buyer- and seller-initiated trades. We study the relation between individual stock returns and volume order imbalances, and analyse the corresponding trading strategies. Further details are included in Appendix 8.

Our findings are robust under the volume measure. We observe the same patterns as count COIs, but note that the R2 of contemporaneous regressions against volume imbalances and Sharpe ratios of corresponding long-short portfolios are generally lower than those of count imbalances, for all types of trades. This finding is in line with previous research (Chan and Lakonishok Citation1995, Chordia and Subrahmanyam Citation2004) which provided evidence that the number of transactions better capture the price pressure from institutions who intend to split their orders for optimal execution.

8.5. Further analysis on portfolio profitability

To supplement the portfolio analysis in Section 7.2, we further consider transaction costs for the selected and benchmark portfolios in figure . We apply flat rates of round trip transaction costs, ranging from 1 to 5 basis points (bps). Our findings hold under various scenarios of costs. With rigorous backtests, the double-sorted portfolios remain profitable and outperform the benchmarks. Details are reported in Appendix 9.

9. Conclusion and future directions

In this paper, we propose the idea of trade co-occurrence, which relates trades arriving close to each other in time and enables the study of interactions among stock transactions at a granular level. Conditional on co-occurrence with other trades, we classify every single trade into five groups. We calculate order imbalances for each type of decomposed trade flow (COI) and investigate their contemporaneous impacts and forecasting power on individual stock returns, as well as their economic value.

Our empirical results show that the decomposed trade flows have different price impacts. The COI of iso trade flow alone can explain a comparable amount of variation in same-day returns as using COI of all trades without the decomposition, while incorporating COIs of other trade flows further improves the explainability. For predictability, we observe that future returns, on average, are positively related with iso and nis-s COIs, while negatively related with nis, nis-c and nis-b COIs. Furthermore, the trade flow decomposition has significant economic value, and constructing long-short portfolios based on the directions of previous days' COIs leads to conspicuous enhancements in the profitability of trading strategies.

Finally, we suggest several future research directions, particularly motivated by our current limitations concerning data availability and computational power. First, we empirically show the significance of decomposing trades based on their co-occurrence with other trades, but we cannot identify who initiates certain types of trades. It would be an interesting research direction to distinguish different types of traders by leveraging private data sets (Tumminello et al. Citation2012, Cont et al. Citation2021) and discover the mechanics behind the interaction of trades. For example, it would be of interest to detect whether informed traders, such as institutions, may successfully hide their trading purpose, leading to their transactions most likely to be isolated from those of others. If high-frequency traders can be identified, it is worth applying the co-occurrence analysis to understand how HFT react to trading activities of other market participants. Second, we have shown that the choice of which universe of stocks is taken as the market index M, has some influence on the decomposition. In future work, it would be worth investigating co-occurrence of trades within subgroups of stocks, for example industries and sectors, leading to a more fine-grained decomposition of the trade flows. Third, due to computational restrictions, we have used the simple rule that if the δ-neighbourhood of a trade has at least one trade, this trade is non-isolated. Instead it could be interesting to consider a threshold hyperparameter when classifying trades. Furthermore, it would be interesting to investigate whether this parameter could be related to the liquidity, trading volume and volatility of each asset. For example, one could use the Poisson null model to find the expected number of trades in a δ-neighbourhood under complete randomness, and set a threshold value above this expectation, so that noise trades can be eliminated. Fourth, co-occurrences of trades could be employed to construct a pairwise similarity between stocks, which could be further leveraged to address non-synchronous trading issues, and to improve robust covariance estimation (Lu et al. Citation2023). Fifth, for data reduction purposes, we only study trades (i.e. the execution of limit orders against market orders), rather than all limit order book events, such as adds or cancels. Past studies have found that submissions of new orders and cancellations of existing limit orders also lead to price impact. It would also be interesting to extend our idea to the co-occurrence of limit orders in the context of order flow imbalances (Eisler et al. Citation2012, Cont et al. Citation2014, Xu et al. Citation2018, Cont et al. Citation2021) and consider conditional order flow imbalances (Sitaru et al. Citation2023) analogues to our COIs.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported in part by the EPSRC under Grants EP/T018445/1, EP/W037211/1, EP/V056883/1, and EP/R018472/1.

Notes

1 We obtain the data of factors from Kenneth French's website. https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_factors.html

References

  • Aaron, J.S., Taylor, A.B. and Chew, T.L., Image co-localization - Co-occurrence versus correlation. J. Cell. Sci., 2018, 131, jcs211847.
  • Aït-Sahalia, Y., Fan, J., Xue, L. and Zhou, Y., How and when are high-frequency stock returns predictable? Technical report, National Bureau of Economic Research, 2022.
  • Aldridge, I., High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems, Vol. 604, 2013 (John Wiley & Sons).
  • Appel, A.E. and Holden, G.W., The co-occurrence of spouse and physical child abuse: A review and appraisal. J. Fam. Psychol., 1998, 12, 578.
  • Araújo, M.B., Rozenfeld, A., Rahbek, C. and Marquet, P.A., Using species co-occurrence networks to assess the impacts of climate change. Ecography, 2011, 34, 897–908.
  • Bailey, W., Cai, J., Cheung, Y.L. and Wang, F., Stock returns, order imbalances, and commonality: Evidence on individual, institutional, and proprietary investors in China. J. Bank. Financ., 2009, 33, 9–19.
  • Bechler, K. and Ludkovski, M., Optimal execution with dynamic order flow imbalance. SIAM J. Financ. Math., 2015, 6, 1123–1151.
  • Brunnermeier, M.K. and Pedersen, L.H., Predatory trading. J. Finance., 2005, 60, 1825–1863.
  • Carhart, M.M., On persistence in mutual fund performance. J. Finance., 1997, 52, 57–82.
  • Cartea, Á., Jaimungal, S. and Penalva, J., Algorithmic and High-Frequency Trading, 2015 (Cambridge University Press).
  • Cattaneo, M.D., Crump, R.K., Farrell, M.H. and Schaumburg, E., Characteristic-sorted portfolios: Estimation and inference. Rev. Econ. Stat., 2020, 102, 531–551.
  • Chakravarty, S., Jain, P., Upson, J. and Wood, R., Clean sweep: Informed trading through intermarket sweep orders. J. Financ. Quant. Anal., 2012, 47, 415–435.
  • Chan, L.K. and Lakonishok, J., The behavior of stock prices around institutional trades. J. Finance., 1995, 50, 1147–1174.
  • Chang, C.Y., Order imbalance and daily momentum investing: Evidence from Taiwan. Financ. Rev., 2012, 47, 697–718.
  • Chang, E.C., Luo, Y. and Ren, J., Pricing deviation, misvaluation comovement, and macroeconomic conditions. J. Bank. Financ., 2013, 37, 5285–5299.
  • Chordia, T. and Subrahmanyam, A., Order imbalance and individual stock returns: Theory and evidence. J. Financ. Econ., 2004, 72, 485–518.
  • Chordia, T., Roll, R. and Subrahmanyam, A., Order imbalance, liquidity, and market returns. J. Financ. Econ., 2002, 65, 111–130.
  • Chordia, T., Goyal, A. and Jegadeesh, N., Buyers versus sellers: Who initiates trades, and when?. J. Financ. Quant. Anal., 2016, 51, 1467–1490.
  • Chu, X. and Qiu, J., Forecasting stock returns using first half an hour order imbalance. Int. J. Finance Econ., 2021, 26, 3236–3245.
  • Cont, R., Cucuringu, M., Glukhov, V. and Prenzel, F., Analysis and modeling of client order flow in limit order markets. Available at SSRN, 2021.
  • Cont, R., Cucuringu, M. and Zhang, C., Price impact of order flow imbalance: Multi-level, cross-sectional and forecasting. arXiv e-prints, pp. arXiv–2112, 2021.
  • Cont, R., Kukanov, A. and Stoikov, S., The price impact of order book events. J. Financ. Econ., 2014, 12, 47–88.
  • Cox, J., ISO order imbalances and individual stock returns. J. Financ. Res., 2021, 44, 5–23.
  • Dagan, I., Lee, L. and Pereira, F.C., Similarity-based models of word cooccurrence probabilities. Mach. Learn., 1999, 34, 43–69.
  • Donges, J.F., Schleussner, C.F., Siegmund, J.F. and Donner, R.V., Event coincidence analysis for quantifying statistical interrelationships between event time series. Eur. Phys. J. Spec. Top., 2016, 225, 471–487.
  • Eisler, Z., Bouchaud, J.P. and Kockelkoren, J., The price impact of order book events: Market orders, limit orders and cancellations. Quant. Finance, 2012, 12, 1395–1419.
  • Fama, E.F. and French, K.R., The cross-section of expected stock returns. J. Finance., 1992, 47, 427–465.
  • Fama, E.F. and French, K.R., Common risk factors in the returns on stocks and bonds. J. Financ. Econ., 1993, 33, 3–56.
  • Fama, E.F. and French, K.R., A five-factor asset pricing model. J. Financ. Econ., 2015, 116, 1–22.
  • Foster, F.D. and Viswanathan, S., Strategic trading when agents forecast the forecasts of others. J. Finance., 1996, 51, 1437–1478.
  • Galleguillos, C., Rabinovich, A. and Belongie, S., Object categorization using co-occurrence, location and appearance. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, 1–8, 2008.
  • Gotelli, N.J., Null model analysis of species co-occurrence patterns. Ecology, 2000, 81, 2606–2621.
  • Grossman, S.J. and Miller, M.H., Liquidity and market structure. J. Finance., 1988, 43, 617–633.
  • Guilbaud, F. and Pham, H., Optimal high-frequency trading with limit and market orders. Quant. Finance, 2013, 13, 79–94.
  • Guo, L., Peng, L., Tao, Y. and Tu, J., News co-occurrence, attention spillover, and return predictability. arXiv preprint arXiv:1703.02715, 2017.
  • Hagströmer, B. and Nordén, L., The diversity of high-frequency traders. J. Financ. Mark., 2013, 16, 741–770.
  • Hirschey, N., Do high-frequency traders anticipate buying and selling pressure?. Manage. Sci., 2021, 67, 3321–3345.
  • Hirshleifer, D. and Jiang, D., A financing-based misvaluation factor and the cross-section of expected returns. Rev. Financ. Stud., 2010, 23, 3401–3436.
  • Huang, R. and Polak, T., Lobster: Limit order book reconstruction system. Available at SSRN 1977207, 2011.
  • Jegadeesh, N. and Titman, S., Returns to buying winners and selling losers: Implications for stock market efficiency. J. Finance., 1993, 48, 65–91.
  • Kolesnikova, O., Survey of word co-occurrence measures for collocation detection. Comput. Sistemas, 2016, 20, 327–344.
  • Kolm, P.N., Turiel, J. and Westray, N., Deep order flow imbalance: Extracting alpha at multiple horizons from the limit order book. Available at SSRN 3900141, 2021.
  • Kraus, A. and Stoll, H.R., Parallel trading by institutional investors. J. Financ. Quant. Anal., 1972, 7, 2107–2138.
  • Kyle, A.S., Continuous auctions and insider trading. Econometrica, 1985, 53, 1315–1335.
  • Kyle, A.S., Ou-Yang, H. and Wei, B., A model of portfolio delegation and strategic trading. Rev. Financ. Stud., 2011, 24, 3778–3812.
  • Lee, Y.T., Liu, Y.J., Roll, R. and Subrahmanyam, A., Order imbalances and market efficiency: Evidence from the Taiwan stock exchange. J. Financ. Quant. Anal., 2004, 39, 327–341.
  • Lu, Y., Reinert, G. and Cucuringu, M., Co-trading networks for modeling dynamic interdependency structures and estimating high-dimensional covariances in US equity markets. arXiv preprint arXiv:2302.09382, 2023.
  • Lucchese, L., Pakkanen, M. and Veraart, A., The short-term predictability of returns in order book markets: A deep learning perspective. arXiv preprint arXiv:2211.13777, 2022.
  • Ma, Z., Pant, G. and Sheng, O.R., Mining competitor relationships from online news: A network-based approach. Electron. Commer. Res. Appl., 2011, 10, 418–427.
  • MacKenzie, D.I., Bailey, L.L. and Nichols, J.D., Investigating species co-occurrence patterns when species are detected imperfectly. J. Anim. Ecol., 2004, 73, 546–555.
  • Newey, W.K. and West, K.D., Automatic lag selection in covariance matrix estimation. Rev. Econ. Stud., 1994, 61, 631–653.
  • O'Hara, M., High frequency market microstructure. J. Financ. Econ., 2015, 116, 257–270.
  • Scharfstein, D.S. and Stein, J.C., Herd behavior and investment. Am. Econ. Rev., 1990, 80, 465–479.
  • Sharpe, W.F., The sharpe ratio. J. Portfolio Manage., 1994, 21, 49–58.
  • Shenoy, C. and Zhang, Y.J., Order imbalance and stock returns: Evidence from China. Q. Rev. Econ. Finance., 2007, 47, 637–650.
  • Sitaru, B., Calinescu, A. and Cucuringu, M., Order flow decomposition for price impact analysis in equity limit order books. To appear in Proceedings of the Fourth ACM International Conference on AI in Finance (ICAIF 2023); SSRN: 4572510, 2023.
  • Spiegel, M. and Subrahmanyam, A., On intraday risk premia. J. Finance., 1995, 50, 319–339.
  • Stoll, H.R., The supply of dealer services in securities markets. J. Finance., 1978, 33, 1133–1151.
  • Tang, Y., Zhou, Y. and Hong, M., News co-occurrences, stock return correlations, and portfolio construction implications. J. Risk Financ. Manage., 2019, 12, 45.
  • Tumminello, M., Lillo, F., Piilo, J. and Mantegna, R.N., Identification of clusters of investors from their real trading activity in a financial market. New. J. Phys., 2012, 14, 013041.
  • Van Kervel, V. and Menkveld, A.J., High-frequency trading around large institutional orders. J. Finance., 2019, 74, 1091–1137.
  • Wang, Q., Teng, B., Hao, Q. and Shi, Y., High-frequency statistical arbitrage strategy based on stationarized order flow imbalance. Procedia. Comput. Sci., 2021, 187, 518–523.
  • Wu, Q., Zhang, Z., Pizzoferroto, A., Cucuringu, M. and Liu, Z., A deep learning framework for pricing financial instruments. arXiv.org, 2019.
  • Xu, K., Gould, M.D. and Howison, S.D., Multi-level order-flow imbalance in a limit order book. Market Microstruct. Liquidity, 2018, 4, 1950011.
  • Yang, L. and Zhu, H., Back-running: Seeking and hiding fundamental information in order flows. Rev. Financ. Stud., 2020, 33, 1484–1533.
  • Ye, S., Zeng, G., Wu, H., Zhang, C., Liang, J., Dai, J., Liu, Z., Xiong, W., Wan, J., Xu, P. and Cheng, M., Co-occurrence and interactions of pollutants, and their impacts on soil remediation—A review. Crit. Rev. Environ. Sci. Technol., 2017, 47, 1528–1553.
  • Zhang, Z., Zohren, S. and Roberts, S., Deeplob: Deep convolutional neural networks for limit order books. IEEE. Trans. Signal. Process., 2019, 67, 3001–3012.
  • Zhang, T., Gu, G.F. and Zhou, W.X., Order imbalances and market efficiency: New evidence from the Chinese stock market. Emerg. Mark. Rev., 2019, 38, 458–467.

Appendices

Appendix 1.

Sample universe of stocks

Table  provides a brief summary of the number of stocks we use in this study.

Table B1. Description of the sample universe.

Appendix 2.

Regression analysis of subperiods

To supplement the results in sections 5 and 6, we perform the regression analysis in the same settings, on a yearly basis. Tables  and show the results for contemporaneous regressions, and tables  and document predictive regressions.

Table C1. Yearly contemporaneous regression against individual COIs.

Table C2. Yearly contemporaneous regression against multiple COIs.

Table C3. Yearly predictive regression against individual COIs.

Appendix 3.

Time series regression and distribution of β

In this section, we conduct contemporaneous (equation (Equation6)) and predictive (equation (Equation7)) regressions against each type of COI, on each stock individually, instead of the panel regressions reported in sections 5 and 6.

A.1. Contemporaneous time series regression

Table  summarizes the results of contemporaneous regressions. All types of COIs have positive impact on prices on average, which aligns with our findings in Section 5. Figure shows the distribution of regression coefficients. Furthermore, figure shows that the distributions of adjusted R2 are right-skewed.

A.2. Predictive time series regression

Table  summarizes the results of predictive regressions. The signs of the coefficients of COIs are consistent with our findings in Section 6. Figure shows the distribution of regression coefficients. Furthermore, figure shows that the distributions of the adjusted R2 are right-skewed.

Appendix 4.

Additional evaluation for regression analysis

Table  provides additional evaluation for the regression analysis in Sections 5 and 6. The conclusions we derive are consistent under additional evaluation.

Appendix 5.

Neighbourhood size effect

To study the effect of neighbourhood size on conditional order imbalances, we repeat the regression and portfolio analysis for each δ{0.05 ms, 0.075 ms, 0.125 ms, 0.25 ms, 0.5 ms, 1 ms, 5 ms, 50 ms}, and display the results.

Figure  illustrates the average R2 of contemporaneous regressions. Isolated order imbalances achieve the highest R2 at δ=0.5 ms. In contrast, the histograms of R2 of the non-isolated imbalances have a U-shape with minimum at δ=0.5 ms. For the three types of non-isolated order imbalances, the R2s for non-self-isolated and non-both-isolated imbalances have downward trends with growth in values of δ. Non-cross-isolated imbalances explain more variance in returns as δ increases.

Figure  details the Sharpe Ratios of long-short portfolios of different COI types ordered by δ. We remark that the Sharpe Ratios of each type of order imbalance peak at different values of δ.

Appendix 6.

Representative of the market effect

The classification of trades depends on the set of stocks we choose as the market index, M. In this section, we compare the fractions of trades, COIs and economic values of the same 457 stocks as described in Section 4, while using constituents of S&P 500, S&P 100 and Dow 30 indices as M, respectively, for the trade flow decomposition. Our original universe contains 457 S&P 500 companies, and we decompose trade flows for all of them, based on the intersections with the other two, smaller, indices. Table  reports the results.

Appendix 7.

Time-of-day effect

We investigate the COIs of different intraday time intervals. First, we evaluate their influences on same-day price change by regressing contemporaneous open-to-close market access returns against each COI individually. Panel A of table  presents the R2 of all such regressions. Excluding the first and last half hours of trades does not explicitly change the imbalance–return relations we discover. Regardless of periods, deriving COIs with only iso trades is enough to explain a comparable amount of variance as when using all trades. Note that, especially for the first hour, the price impact mainly stems from isolated trades.

Second, we trade on each COI by constructing single-sort long-short portfolios and present annualized Sharpe Ratios in Panel B. It is reasonable to expect that trading activities towards the end of the normal trading period contribute more to forecasting future returns. We observe that the signal corresponding to the iso and nis-s COIs of the last hour leads to a 0.41 and 0.59 increase in Sharpe Ratios, significantly enhancing the portfolio profits. Conversely, the last half-hour of non-cross-isolated COI is not a good signal for predicting future returns. For the nis-b trades, the future returns are only predicted by the COI during less volatile trading hours.

Appendix 8.

COI measured by volumes

Instead of considering the number of trades, in this section we analyse volume order imbalances defined as (A1) COIi,ttype=Vi,ttype,buyVi,ttype,sellVi,ttype,buy+Vi,ttype,sell,(A1) where Vi,ttype,buy and Vi,ttype,sell denote the total volume of market buy orders and market sell orders of stock i on day t. We repeat the analysis on volume imbalances and present the results in table .

Appendix 9.

Further analysis on portfolio profitability

To verify the robustness of the profitability of the proposed COI-sorted portfolios, we apply transaction costs on backtests. Assuming flat round trip transaction costs over all stocks, we test on cost rates including 1, 2, 3, 4 and 5 bps. Recall that, for the sort-based long-short strategies, we open positions at market open and liquidate at market, without holding overnight positions; the daily turnover is always 100%. Therefore, we directly subtract fixed transaction cost rates from daily portfolio returns during backtesting. Additionally, we ignore transaction costs for equal weight and SPY ETF since daily rebalancing is not needed for them. Table A12 reports the annualized returns and Sharpe ratios for selected portfolios and benchmarks. From the table, we observe that the portfolio single-sorted on iso turns to loss when cost is greater than 2 bps. In contrast, the profitability of portfolios double-sorted on iso−nis and iso−nis-c persists and consistently outperforms benchmarks. In particular, the iso−nis-c portfolio obtains annualized return of 22.27% and Sharpe ratio of 1.11 under the strictest scenario.

Table C4. Yearly predictive regression against multiple COIs.

Table D1. Contemporaneous time series regressions.

Table D2. Predictive time series regressions.

Figure A1. Distributions of coefficients of contemporaneous time series regressions. This figure displays the histogram and kernel density estimation of the coefficients of contemporaneous time series regressions. The orange line indicates the mean of the coefficients.

Figure A1. Distributions of coefficients of contemporaneous time series regressions. This figure displays the histogram and kernel density estimation of the coefficients of contemporaneous time series regressions. The orange line indicates the mean of the coefficients.

Figure A2. Distributions of adjusted R2 of contemporaneous time series regressions. This figure displays the histogram and kernel density estimation of the adjusted R2 of contemporaneous time series regressions. The orange line indicates the mean of the adjusted R2.

Figure A2. Distributions of adjusted R2 of contemporaneous time series regressions. This figure displays the histogram and kernel density estimation of the adjusted R2 of contemporaneous time series regressions. The orange line indicates the mean of the adjusted R2.

Figure A3. Distributions of coefficients of predictive time series regressions. This figure shows the histogram and kernel density estimation of the coefficients of predictive time series regressions. The orange line indicates the mean of the coefficients.

Figure A3. Distributions of coefficients of predictive time series regressions. This figure shows the histogram and kernel density estimation of the coefficients of predictive time series regressions. The orange line indicates the mean of the coefficients.

Figure A4. Distributions of adjusted R2 of predictive time series regressions. This figure displays the histogram and kernel density estimation of the adjusted R2 of predictive time series regressions. The orange line indicates the mean of the adjusted R2.

Figure A4. Distributions of adjusted R2 of predictive time series regressions. This figure displays the histogram and kernel density estimation of the adjusted R2 of predictive time series regressions. The orange line indicates the mean of the adjusted R2.

Table E1. Additional evaluation for regression analysis.

Figure A5. Average R2 of regressing contemporaneous returns against COIs for different δs.

Figure A5. Average R2 of regressing contemporaneous returns against COIs for different δs.

Figure A6. Sharpe ratios of COI-based long-short portfolios for different δs.

Figure A6. Sharpe ratios of COI-based long-short portfolios for different δs.

Table G1. Trades and COIs by universe of stocks.

Table H1. COIs by time period.

Table I1. COIs measured by volume.

Table J1. Annualized returns and Sharpe ratios of selected and benchmark portfolios.