394
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Beyond point forecasts: Uncertainty quantification in tensor extrapolation for relational time series data

Abstract

Contemporary data sets are frequently relational in nature. In retail, for example, data sets are more granular than traditional data, often indexing individual products, outlets, or even customers, rather than aggregating them at the group level. Tensor extrapolation aims to forecast relational time series data; it combines tensor decompositions and time series extrapolation. However, previous approaches to tensor extrapolation are restricted to point forecasts and fail to quantify the uncertainty associated with the forecasts. This paper provides prediction intervals within which we expect the true future values to lie with a specified probability. Numerical experiments suggest the effectiveness of the proposed method in terms of forecast performance. Moreover, the approach proves to be up to 77 times faster than the univariate competitor. For this reason, it exerts positive influence on both inclusiveness and the environment.

1 Introduction

In general, variables can be absolute or relational (Wasserman and Faust Citation1994). The degree of environmental awareness of a person, their gender, education, and age are absolute variables. If, however, a variable focuses on the relationship among entities, it is called a relational variable. Examples of relational variables include the intensity of communications between members of a group, the extent of economic transactions between corporations, and the existence or nonexistence of trade or treaties between countries. Relational variables are often dynamic, i.e., they evolve over time.

Across numerous domains, contemporary data sets are relational by nature (Kitchin Citation2014; Müller et al. Citation2016). In retail, for instance, data sets are more granular than traditional data, often capturing information at the level of individual items, outlets, or even users, rather than aggregating them at the group level (George et al. Citation2016; Fildes, Ma, and Kolassa Citation2022). As a result, there are different types of dependencies between the variables of interest (e.g., dependencies among stores, dependencies within products). The relational character of the data is, however, often disregarded, particularly in prediction tasks. Instead, univariate extrapolation approaches are applied, which cannot address inter-series dependencies. In addition, available multivariate forecasting methods (e.g., vector autoregressions) are limited to low-dimensional settings and are, hence, not suitable for practical use in large-scale forecasting problems (Hyndman and Athanasopoulos Citation2018; De Stefani and Bontempi Citation2021).

Tensor extrapolation is an attempt to forecast relational time series data using multi-linear algebra. The process unfolds as follows: Multi-way data are organized in the form of tensors, i.e., multi-dimensional arrays. Tensor decompositions are then employed to identify periodic patterns in the data. Afterwards, these patterns serve as input for time series methods. Tensor extrapolation originated in the work of Dunlavy, Kolda, and Acar (Citation2011) and Spiegel et al. (Citation2012), initially limited to preselected time series approaches and binary data. Only recently, Schosser (Citation2020, Citation2022b) laid the groundwork for applications in large-scale forecasting problems.

The purpose of forecasting is to facilitate decision making in the face of uncertainty. To this end, forecasts shall provide an unbiased guess at what is most likely to happen, the so-called point forecast. Increasingly, an estimate of the uncertainty associated with this forecast is called for (Makridakis, Hyndman, and Petropoulos Citation2020; Makridakis, Spiliotis, and Assimakopoulos Citation2022b; Makridakis et al. Citation2022c). So far, tensor extrapolation is restricted to point forecasts, but fails to quantify the respective uncertainty. The paper at hand intends to fill this gap and, hence, contributes to the literature on tensor extrapolation. It aims at prediction intervals within which we expect the true future values to lie with a specified probability. However, in a broader perspective, there is a pressing need to ensure efficient implementation.

Until recently, the forecasting community placed strong (at times, exclusive) emphasis on achieving “state-of-the-art” results. Typically, accuracy (or similar) measures were reported, while any mention of cost or efficiency was omitted (Nikolopoulos and Petropoulos Citation2018; Petropoulos et al. Citation2022). Notwithstanding the benefits of improving model performance, this narrow focus ignores the economic, environmental, and social cost of reaching the reported results (Schwartz et al. Citation2020; Wu et al. Citation2022). Contemporary forecasting technology is already very expensive to train or execute, which restricts the ability of many researchers to study it, and of practitioners to adopt it. What is more, the computing landscape undergoes a profound transition as improvements in central processing units (CPUs) are slowing down, and an ever-increasing part of applications moves to specialized processors (Thompson and Spanuth Citation2021). Taken together, rising computational burdens and the switch toward customized, high-end hardware threaten to fragment the field. Another critical aspect is the environmental footprint left in deriving forecasts. Escalating computational experiments greatly increase the amount of energy consumed and, consequently, the extent of natural resources spent.

Meeting the challenges outlined above necessitates a reevaluation of our assessment standards. We advocate for the integration of efficiency as a prevalent criterion, complementing the traditional focus on forecast performance. This strategic shift aligns seemlessly with our commitment to fostering environmentally friendly and inclusive forecasting practices. The average runtime for generating a result serves as a natural proxy for efficiency. All else being equal, a faster method inherently implies a reduction in computational workload.

When expanding tensor extrapolation, we not only seek to improve forecast performance but also emphasize efficiency. The derivation of prediction intervals frequently emerges as one of the most computationally demanding elements in the forecasting workflow, particularly where simulations are conducted. In recognition of this, we take a step back, embracing a heuristic approach to articulate the uncertainty in forecasts. This decision proves fruitful: Our calculations reveal that tensor extrapolation consistently outperforms the univariate baseline while incurring far lower computational cost.

The remainder of the paper progresses as follows. Section 2 briefly reviews related work. Section 3 presents the proposed method. It provides background material on tensor analysis, introduces the state of the art of tensor extrapolation, and explains our extensions. Section 4 describes the experiments conducted. This includes the decision on the performance metrics, the technical realization, and the application to empirical and synthetic data. The results are discussed in Section 5. Finally, Section 6 concludes the paper.

2 Related work

For an introduction to time series analysis, we suggest Box and Jenkins (Citation1976). A more contemporary overview of the forecasting literature is given by Hyndman and Athanasopoulos (Citation2018). The following areas are highly significant to our problem: tensor autoregression, forecasting competitions, and meta-learning for time series forecasting.

2.1 Tensor autoregression

During the last few years, tensor-based techniques have received growing attention within the statistics community (Bi et al. Citation2021). For time series data, we can roughly distinguish three categories of approaches. In the first category, tensor analysis is used as a building block in deep learning models, specifically to accelerate and stabilize the training of neural networks (Yu et al. Citation2017; Deng et al. Citation2024). The second category involves transforming low-dimensional time series data into higher-dimensional data through a process known as tensorization (Shi et al. Citation2020; Barsbey and Cemgil Citation2023). The third category pertains to relational data, meaning data that can naturally be represented as a tensor (Schosser Citation2022b). A complete survey of the highly dynamic literature is beyond the scope of this paper. Instead, we give examples of work in the second and third class. In comparison with traditional forecasting techniques, autoregressive methods are of particular interest. Hill, Li, and Schneider (Citation2021) recommend a regression model for univariate time series with known seasonal periodicity. Within their framework, past observations of the dependent variable are stacked in tensor format. In contrast, Yu, Rao, and Dhillon (Citation2016) develop a model for relational time series data. Unlike tensor extrapolation, where tensor factorizations are combined with time series extrapolation methods, Yu, Rao, and Dhillon (Citation2016) introduce an autoregressive regularizer and thus impose specific temporal structure on the factorization products. Both Hill, Li, and Schneider (Citation2021) and Yu, Rao, and Dhillon (Citation2016) focus on point forecasts. Tensor extrapolation employs a general class of state-space models, offering a more comprehensive setting compared to autoregressive approaches.

2.2 Forecasting competitions

Competitions (or “challenges”, “common tasks”) are omnipresent in machine learning, statistics, and related fields (Feuerverger, He, and Khatri Citation2012; Donoho Citation2017). They lower barriers to entry by providing clearly defined tasks and the resources needed to accomplish them. Moreover, they create research communities with shared goals and assumptions and, perhaps most important, offer proof of incremental scientific progress (Liberman Citation2010). The so-called M competitions are the most influential and frequently cited competitions in the field of time series forecasting (Hyndman Citation2020). The focus of the recent M5 competition is on a retail sales forecasting application, examining both accuracy and uncertainty estimation (Makridakis, Spiliotis, and Assimakopoulos Citation2022b; Makridakis et al. Citation2022c). The data provided comprise grouped, correlated time series, organized in a hierarchical structure. Depending on the level of aggregation, different forecasting methods stand out. For the top and middle parts of the hierarchy, machine learning-based approaches offer considerable gains in performance over pure statistical methods. At the lower end of the hierarchy, especially at the product-store level, capturing time series patterns becomes more challenging due to the presence of irregularities. This is where traditional methods come into play. We demonstrate the adequacy of tensor extrapolation at a comparable level of granularity.

2.3 Meta-learning for time series forecasting

In relevant literature, there are several classes of forecasting models such as exponential smoothing, Autoregressive Integrated Moving Average (ARIMA), or deep neural networks. Within individual model families, automation of model selection and tuning of forecasting algorithms has been studied in detail (Hyndman and Khandakar Citation2008; Salinas et al. Citation2020). Meta-learning for time series forecasting is a natural progression of these efforts. It aims to efficiently train, optimize, and choose the best-performing model among various classes of predictive models for a given data set. Meta-learning platforms have recently been proposed by Gastinger et al. (Citation2021), Ma and Fildes (Citation2021), and Shah et al. (Citation2021). However, these approaches do not feature routines specifically designed for relational data. We illustrate that accounting for the relational character of the data pays off, and suggest tensor extrapolation as an additional building block in meta-learning frameworks.

3 Proposed method

This section covers background information on tensor analysis. In addition, it shows the current state of tensor extrapolation and explains our extensions.

3.1 Tensor analysis

Multi-way arrays, or tensors, are multi-dimensional collections of numbers. The dimensions are known as ways, orders, or modes of a tensor. Using this terminology, scalars, vectors, and matrices can be interpreted as zero-order, first-order, and second-order tensors, respectively. Tensors of order three and higher are called higher-order tensors (Cichocki et al. Citation2009; Kolda and Bader Citation2009). In the simplest high-dimensional case, the tensor can be thought of as a “data cube” (Papalexakis, Faloutsos, and Sidiropoulos Citation2016; Rabanser, Shchur, and Günnemann Citation2017). This case should be formalized in the following: Let I,J,KN denote index upper bounds, i.e., the number of entities in the modes of interest; a third-order tensor is denoted by X¯RI×J×K. Here, xijk describes the entry in the ith row, jth column, and kth tube of X¯. The modes of X¯ are referred to as mode A, mode B, and mode C, respectively. Throughout this section, we will restrict ourselves to third-order tensors for reasons of simplicity. Nevertheless, the concepts introduced naturally extend to tensors of order four and higher. It should be noted that most of the notation and terminology used in our deliberations is borrowed from Kiers (Citation2000). A brief summary of the tensor notation is presented in .

Table 1 Tensor notation summary.

The so-called Candecomp/Parafac (CP) decomposition is one of the most common tensor factorizations. It decomposes a tensor as a set of factor matrices. The model was independently proposed by Hitchcock (Citation1927), Carroll and Chang (Citation1970), and Harshman (Citation1970). Optimization entails minair,bjr,ckri=1Ij=1Jk=1K(xijkr=1Rairbjrckr)2.

Here, air , bjr , and ckr denote elements of the component matrices A (for mode A), B (for mode B), and C (for mode C) of sizes (I×R),(J×R), and (K×R), respectively. The number of components R represents the only hyperparameter to be specified. The optimization problem stated above assumes that the random fluctuation in the tensor data follows a Gaussian distribution. Alternative models can be found in Chi and Kolda (Citation2012) and Papalexakis, Faloutsos, and Sidiropoulos (Citation2016).

3.2 Tensor extrapolation: literature and extensions

Tensor extrapolation seeks to forecast relational time series data. That means, relations for T time steps are given as input and the goal is to predict the relations at future times T + 1, T + 2, , T + L. Without loss of generality, the scope of this presentation is limited to the case where the snapshots of relations can be represented in the form of a matrix. Here, tensors provide a straightforward way to integrate the temporal dimension. Consequently, a third-order data array X¯ of size (I×J×T) is given, with time being modeled in mode C. Extrapolation entails computing the tensor X̂¯ of size (I×J×L) that includes point estimates concerning future relations. The approach involves the use of tensor decomposition and exponential smoothing. It proceeds as follows: The multi-way data array is decomposed applying Candecomp/Parafac (CP) factorization. Every one of the component matrices gives information about the respective mode, i.e., detects latent structure in the data. For further processing, the “time” component matrix C of size (T × R) is converted into a set of column vectors cr. These vectors capture different periodic patterns, e.g., seasons and trends. The periodic patterns discovered are taken as input for exponential smoothing techniques. In doing so, forecasts are obtained and arranged as columns ĉr of the new “time” component matrix Ĉ of size (L × R). Eventually, the tensor X̂¯ can be calculated; it contains estimates of future relations.

Tensor extrapolation was originally developed for binary data, indicating the presence or absence of relations of interest. Dunlavy, Kolda, and Acar (Citation2011) use the temporal profiles computed by CP as a basis for the so-called Holt-Winters Method, i.e., exponential smoothing with additive trend and additive seasonality. Spiegel et al. (Citation2012) apply CP decomposition in connection with simple exponential smoothing, meaning exponential smoothing without trend and seasonality. In both articles, the model parameters are hard-coded. Later on, Schosser (Citation2022b) breaks ground for data-driven model selection and estimation in large-scale forecasting problems. First, he draws on an automatic forecasting procedure based on a general class of state-space models encompassing all standard exponential smoothing methods (Hyndman et al. Citation2002; Hyndman and Khandakar Citation2008). Model selection and optimization of both smoothing parameters and initial state variables are performed “individually” (Fildes Citation1989; Petropoulos et al. Citation2014). This means that they are done based on each individual periodic pattern contained in a column of the matrix C. Second, he investigates how to use the methodology with real-valued data. This highlights the importance of data preprocessing. Schosser (Citation2022a) transfers the framework into a setting characterized by missing values and thus increases the practical relevance of tensor extrapolation. Note that there are adaptations of tensor extrapolation that go beyond the scope of pure time series forecasting. Particular mention should be made of approaches that use additional information from distinct sources, for example in the form of regularized (Bi et al. Citation2022) or coupled decompositions (Araujo, Ribeiro, and Faloutsos Citation2017).

To date, tensor extrapolation is used to generate point forecasts. There is no quantification of the uncertainty associated with the predictions. The paper at hand provides an appropriate extension and, consequently, contributes to the literature on the forecasting of relational data. In line with the goal of computational efficiency, we take a heuristic approach. For a single time series, the lower and upper endpoints of the prediction interval are supposed to be given by x̂ijl=x̂ijl0.5λlσij·e

and x̂ijl=x̂ijl+0.5λlσij·e, respectively. Here, the point forecast x̂ijl marks the middle of the range of possible values. The interval width is proportional to σij·e, i.e., the standard deviation of residual errors in the estimation sample. A characteristic common to prediction intervals is that they increase in width as the forecast horizon increases. The further ahead the forecast, the more uncertainty is associated with it, and, hence, the wider the prediction interval. Our framework ensures this effect; the interval width grows at the square root of the temporal distance l (l1,,L). The multiplier λ depends on the desired coverage probability. In this paper, we derive 80% and 95% prediction intervals, although any percentage may be used. Inspired by the quantiles of a standardized Gaussian distribution, we specify λ as 1.28 and 1.96, respectively. Since the CP decomposition employed is based on the assumption of Gaussian errors, this imposes no further restrictions on the scope of application. Both the lower and upper interval bounds are organized in tensor form. Algorithm 1 displays our modifications to tensor extrapolation. Numerical experiments suggest the effectiveness of our proposed method. We use both real-world and synthetic data to demonstrate the superiority of the procedure over an established baseline in terms of forecast performance and computational burden.

Algorithm 1.

Tensor extrapolation for large-scale data including point and interval forecasts

4 Experimental evaluation

Next, we detail the choice of performance measures, the implementation of our approach, and its application to empirical and synthetic data.

4.1 Performance measures

We rely on the following performance measures to compare our predictions with the observations retained: The Mean Absolute Percentage Error (MAPE) represents a scale-free performance measure for point forecasts. It is given by MAPE=1I·J·Li=1Ij=1Jl=1L|xijlx̂ijl||xijl|·100.

The literature on time series forecasting often favors the Symmetric Mean Absolute Percentage Error (sMAPE) and the Mean Absolute Scaled Error (MASE) as performance metrics (Hyndman and Koehler Citation2006). For the data sets used in the following, MAPE has been proven to be a reliable proxy for sMAPE (Schosser Citation2020, Citation2022a, Citation2022b). With MASE, the denominator is given by the average absolute error of a naïve one-step prediction. Since a naïve forecast does not offer a suitable benchmark for data sets with heterogeneous periodic patterns, this may lead to distortions. Therefore, and in the light of our focus on interval prediction, we concentrate on a single measure of the quality of point forecasts.

The Degree of Miscalibration (DM) is used to compare the performance of the methods on how frequently prediction intervals contain the true values of the series. We first compute the actual coverage, i.e., the percentage of true values included in the (nominally 80% or 95%) prediction intervals. From this, we derive DM, understood as the difference between the actual coverage for the forecasting method under study and the nominal probability level. Where the actual coverage falls short of the nominal probability level (negative difference), we denote the interval as overconfident. Where the reverse relationship holds (positive difference), we refer to the interval as underconfident. Miscalibration typically results from inappropriate interval widths or location of intervals.

The Mean Scaled Interval Score (MSIS) is a more elaborate attempt at quantifying the adequacy of prediction intervals. Formally, MSIS=1I·J·Li=1Ij=1Jl=1LISijl|xijl|

and ISijl=(x̂ijlx̂ijl)+p(x̂ijlxijl)1{xijl<x̂ijl}+p(xijlx̂ijl)1{xijl>x̂ijl} hold. Here, 1{} describes an indicator function that returns the value one if the condition in curly brackets is fulfilled and zero otherwise. A forecast method is rewarded for narrow prediction intervals, and incurs a penalty, the size of which depends on the constant p, if the observation misses the interval (Gneiting and Raftery Citation2007; Landon and Singpurwalla Citation2008). Typically, the parameter p is chosen in an ad-hoc manner (Makridakis et al. Citation2022c); this is why we include a range of values in our analysis. To allow for aggregation over the entire data set, scaling is required. Within the M4 and M5 competitions, the average absolute error of a naïve one-step forecast was used as a scaling factor. For reasons discussed above, we discard this approach. Instead, MSIS is scaled similarly to MAPE.

The Mean Scaled Interval Width (MSIW) indicates the degree of uncertainty associated with the forecasts. Wider intervals reflect higher uncertainty. In our case, it is given by MSIW=1I·J·Li=1Ij=1Jl=1Lx̂ijlx̂ijl|xijl|.

For MSIW, we apply the same scaling as with MAPE.

4.2 Implementation

For our calculations, we use the programming language Python. In particular, tensor extrapolation employs the functions parafac (library TensorLy; Kossaifi et al. Citation2019) and ETSModel (library statsmodels; Seabold and Perktold Citation2010). When parafac is called, the number of components R must be declared. The function ETSModel provides a Python-based implementation of the automatic exponential smoothing algorithm designed by Hyndman et al. (Citation2002) and Hyndman and Khandakar (Citation2008). This algorithm offers a variety of possible models and has been proven to be very powerful in comparisons with other forecasting methods (Januschowski et al. Citation2020; Gastinger et al. Citation2021; Karlsson Rosenblad Citation2021). Further, in relation to more sophisticated forecasting techniques, the requirements on data availability and computational resources are fairly low (De Stefani and Bontempi Citation2021; Schosser Citation2022b). As our baseline, we employ univariate extrapolation through ETSModel, producing both point forecasts and prediction intervals. Diverging from tensor extrapolation, which explicitly models the entire data set in tensor shape, this univariate approach forecasts individual time series one by one, effectively partitioning the tensor into vectors. The choice of this baseline is driven by the outcomes of the M5 competition, where traditional statistical methods demonstrated superiority in handling disaggregated data (Makridakis, Spiliotis, and Assimakopoulos Citation2022b; Makridakis et al. Citation2022c). To quantify the computational burden associated with the methods in question, we resort to the Python library timeit. The measurements refer to a commodity notebook with AMD Ryzen 5 4500U 6x2.40 GHz processor and 8 GB RAM.

4.3 Application: empirical data

Following Schosser (Citation2022b), our empirical study relies on data provided by the United Nations. Trade data between countries of the world are accessible at the UN Comtrade website https://comtradeplus.un.org. We choose a set of 30 countries, usually large or developed countries with high gross domestic products and trade volumes. As a trade category, we select “export of goods.” We analyze monthly data collected from 2012 to 2016. Since the exports of a country to itself are not defined, we end up with a total of 870 time series, each comprising 60 observations. The data are partitioned into an estimation sample and a hold-out sample. The latter consists of the twelve most recent observations that are used to evaluate the forecast accuracy. Estimation sample and hold-out sample are organized in tensor form. We thus obtain X¯est of size (30×30×48) and X¯hold of size (30×30×12), respectively. The methods under study are implemented, or trained, on the estimation sample. Point forecasts are produced for the whole of the hold-out sample and arranged as tensor X̂¯ of size (30×30×12). In addition, we determine the interval boundaries and organize them into the tensors X̂¯ and X̂¯ (of corresponding size). It is worth mentioning that Hoff (Citation2011) and Ma, Qin, and Xia (Citation2023) examine United Nations trade data within a tensor-based setting. Assuming corresponding multivariate Gaussian distributions, the authors detect correlations among the index sets of the data. However, the information obtained is not utilized for prediction purposes.

displays our results. In accordance with Schosser (Citation2022b), tensor extrapolation consistently outperforms univariate exponential smoothing in terms of MAPE, i.e., the quality of point forecasts. However, the findings concerning the associated prediction intervals are also very encouraging. On the basis of MSIS, our proposed approach dominates the baseline. This holds true irrespective of the parameter p. DM remains within reasonable limits. With 80% prediction intervals, there are modest advantages for tensor extrapolation. With 95% prediction intervals, the inverse relationship applies. As measured by MSIW, the proposed method generates slightly wider prediction intervals. Obviously, this drawback is more than compensated by the better location of the intervals.

Table 2 Forecasting performance based on empirical data in terms of MAPE, DM, MSIS, and MSIW.

As a next step, we delve into the computational efforts associated with the methods employed. In case of 100 executions, the average runtime for tensor extrapolation with R = 4 components is a mere 0.19 seconds. This calculation includes 0.12 seconds attributed to the tensor decomposition process. With R = 4 components, the automatic exponential smoothing algorithm is applied on an equivalent number of time series. In conjunction with the heuristic derivation of prediction intervals, this operation consumes an additional 0.07 seconds. In stark contrast, the baseline approach exhibits a significantly higher average runtime of 13.40 seconds. This value stems from 870 calls to the function ETSModel, one for each time series within the data set. It is crucial to note that the application of the automatic exponential smoothing algorithm, encompassing the selection and estimation of an appropriate exponential smoothing method, results in highly volatile runtimes contingent on the characteristics of the respective time series. In essence, tensor extrapolation substantially reduces the dimensions of the forecasting problem, rendering it 70 times faster.

4.4 Application: synthetic data

Regardless of the promising results of the previous subsection, our approach should be applied to more than just one data set. An obvious use case would be the data set underlying the M5 competition. However, this data record exhibits two features that preclude the use of tensor extrapolation for the moment. First, apart from time series data, it provides exogenous (or explanatory) variables, such as holidays, product prices, promotions, and social events. Second, the competition involved series that display intermittency, meaning sporadic demand, including many zeros (Syntetos, Boylan, and Croston Citation2005). Appropriate extensions of our framework to these problems are the subject of future work.

For further testing, it is essential to base the assessment on time series with both diverse and controllable characteristics. The relevant literature increasingly calls for the use of synthetic data (Kang, Hyndman, and Li Citation2020; Makridakis et al. Citation2022a) or already resorts to such data (Hill, Li, and Schneider Citation2021; Shah et al. Citation2021; Das et al. Citation2024). With this in mind, we create a comprehensive data set closely related to the field of retailing. Relational data appear, for example, in companies that sustain relationships with a large number of business partners or users and offer a multitude of products or services. Therefore, our simulated data should be understood as sales volume by user, product, and time. We presume that the data can be modeled by R = 4 components and strive for a tensor of size (I×J×T)=(160×120×100). In this way, we arrive at a total of 19,200 time series with 100 observations each.

When constructing our synthetic data set, we adhere to the procedure introduced by Dunlavy, Kolda, and Acar (Citation2011) and adapted by Schosser (Citation2022a, Citation2022b). Apart from missing values, the resulting data correspond to those used by Schosser (Citation2022a). To start with, we generate the component matrices. In this regard, we do not encounter specific limitations. Notably, the CP model does not impose orthogonality constraints on the component matrices (Dunlavy, Kolda, and Acar Citation2011; Papalexakis, Faloutsos, and Sidiropoulos Citation2016). The matrices A and B of size (I×R) and (J×R), respectively, can be considered as “entity participation” matrices. That is to say, column ar (resp. br) is the vector of participation levels of all the entities in component r. Matrix A demonstrates the participation of the users: Here, the users are expected to respond in a different way to a specific product-time combination. We measure the magnitude of this reaction according to the density function of a Gaussian distribution (cf. (a)). Matrix B shows the product participation: Groups of products can be distinguished that respond similarly to the same time and user effect combination (cf. (b)). The columns of matrix C of size (T×R) record a number of periodic patterns. We create a sinusoidal pattern, a trend, a structural break, and another break (cf. (c)). Aggregation of the matrices A, B, and C yields a noise-free version of the tensor. Eventually, we add Gaussian noise to any entry. The standard deviation of the error term is taken to equal half the mean of the noise-free tensor entries. We split our simulated data into an estimation sample and a hold-out sample. The latter is made up of the 20 most recent observations. Once more, the forecasts are produced for the whole of the hold-out sample.

Fig. 1 Generation of synthetic data.

Fig. 1 Generation of synthetic data.

shows our results. With regard to the accuracy of point forecasts, tensor extrapolation easily beats the baseline. The related forecast intervals are, however, also highly convincing. In terms of miscalibration, our proposed method dominates. What is more, it generates slightly narrower prediction intervals. Finally, tensor extrapolation exhibits better interval scores in almost all parameter settings.

Table 3 Forecasting performance based on synthetic data in terms of MAPE, DM, MSIS, and MSIW.

Given the variety and volume of our synthetic data, the computational cost are more pronounced. Based on 100 executions, the average runtime of tensor extrapolation with R = 4 components equals 4.34 seconds. Specifically, 4.20 seconds are dedicated to tensor decomposition. Operating the automatic exponential smoothing algorithm on R = 4 time series, along with the calculation of prediction intervals, adds 0.14 seconds to the process. The competitor takes on average 334.34 seconds. This figure is derived from 19,200 calls to the function ETSModel, each corresponding to a time series within the data set. The method advocated is, thus, 77 times faster.

5 Discussion

The findings of our study highlight the superior performance of tensor extrapolation, excelling in both the accuracy of point forecasts and the estimation of associated uncertainty. Moreover, our method demonstrates a noteworthy speed advantage over the baseline, resulting in a significantly reduced computational burden. This advantage is expected to persist across various setups, indicating that the relative differences in computational time will endure regardless of the system used. Therefore, tensor extrapolation disrupts the conventional tradeoff between forecast performance and computational efficiency. Essentially, even when the objective is to quantify uncertainty, accounting for the relational nature of the data proves beneficial. This approach enables us to capture intrinsic structural information and various types of dependencies, thereby enhancing the overall effectiveness of the forecasting workflow.

In selecting our data records, we emphasized diverse and controllable features. Nevertheless, one aspect we did not address was the length of the time series to be predicted—the number of observations at our disposal for training the methods. Generally, the availability of time series data depends on the specifics of the domain and application. In retail, for instance, product life cycles are becoming shorter and shorter, limiting the access to historical sales figures to a specific period (Fildes, Ma, and Kolassa Citation2022). As demonstrated, tensor extrapolation is well-equipped for such scenarios (see also Schosser Citation2022b). However, there is no reason to question the suitability of our method for longer time series; the capacity of CP decomposition to identify relevant underlying patterns remains unaffected.

6 Conclusions

In spite of the opportunities emerging from today’s “big data,” the relational character of many time series is largely neglected in forecasting tasks. Recently, tensor extrapolation has been shown to be effective in forecasting large-scale relational data (Schosser Citation2022a, Citation2022b). However, the results so far are limited to point forecasts. The paper at hand gives an indication of the uncertainty in forecasts. It offers prediction intervals that prove to be highly accurate even though they lack a theoretical underpinning. The proposed approach is characterized by exceptionally low computational cost. The latter is increasingly important for the acceptance and successful implementation of a method, especially where thousands, or even millions of forecasts and estimates of uncertainty are needed on a regular basis. Taken together, the findings of this paper provide a compelling argument in favor of tensor extrapolation.

Supplemental material

ucas_a_2329171_sm2483.zip

Download Zip (188.4 KB)

References

  • Araujo, M., P. Ribeiro, and C. Faloutsos. 2017. “TensorCast: Forecasting with Context Using Coupled Tensors.” In Proceedings of the IEEE International Conference on Data Mining (ICDM), 71–80. IEEE.
  • Barsbey, M., and T. Cemgil. 2023. “Modeling Hierarchical Seasonality through Low-Rank Tensor Decompositions in Time Series Analysis.” IEEE Access 11 (1):85770–84. https://doi.org/10.1109/ACCESS.2023.3298597.
  • Bi, X., G. Adomavicius, W. Li, and A. Qu. 2022. “Improving Sales Forecasting Accuracy: A Tensor Factorization Approach with Demand Awareness.” INFORMS Journal on Computing 34 (3):1644–60. https://doi.org/10.1287/ijoc.2021.1147.
  • Bi, X., X. Tang, Y. Yuan, Y. Zhang, and A. Qu. 2021. “Tensors in Statistics.” Annual Review of Statistics and Its Applications 8 (1):345–68. https://doi.org/10.1146/annurev-statistics-042720-020816.
  • Box, G. E. P., and G. M. Jenkins. 1976. Time Series Analysis: Forecasting and Control. San Francisco: Holden-Day.
  • Carroll, J. D., and J.-J. Chang. 1970. “Analysis of Individual Preferences in Multidimensional Scaling via an N-way Generalization of ‘Eckart-Young’ Decomposition.” Psychometrika 35 (3):283–319. https://doi.org/10.1007/BF02310791.
  • Chi, E. C., and T. G. Kolda. 2012. “On Tensors, Sparsity, and Nonnegative Factorizations.” SIAM Journal of Matrix Analysis and Applications 33 (4):1272–99. https://doi.org/10.1137/110859063.
  • Cichocki, A., R. Zdunek, A. H. Phan, and S. Amari. 2009. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multiway Data Analysis and Blind Source Separation. Chichester: Wiley.
  • Das, A., W. Kong, R. Sen, and Y. Zhou. 2024. “A Decoder-Only Foundation Model for Time-Series Forecasting.” Technical Report, arXiv:2310.10688.
  • De Stefani, J., and G. Bontempi. 2021. “Factor-Based Framework for Multivariate and Multi-Step-Ahead Forecasting of Large Scale Time Series.” Frontiers in Big Data 4 (1):e690267. https://doi.org/10.3389/fdata.2021.690267.
  • Deng, J., J. Deng, D. Yin, R. Jiang, and X. Song. 2024. “TTS-Norm: Forecasting Tensor Time Series via Multi-Way Normalization.” ACM Transactions on Knowledge Discovery from Data 18 (1):e3. https://doi.org/10.1145/3605894.
  • Donoho, D. 2017. “50 Years of Data Science.” Journal of Computational and Graphical Statistics 26 (4):745–66. https://doi.org/10.1080/10618600.2017.1384734.
  • Dunlavy, D. M., T. G. Kolda, and E. Acar. 2011. “Temporal Link Prediction Using Matrix and Tensor Factorizations.” ACM Transactions on Knowledge Discovery from Data 5 (2):e10. https://doi.org/10.1145/1921632.1921636.
  • Feuerverger, A., Y. He, and S. Khatri. 2012. “Statistical Significance of the Netflix Challenge.” Statistical Science 27 (2):202–31. https://doi.org/10.1214/11-STS368.
  • Fildes, R. 1989. “Evaluation of Aggregate and Individual Forecast Method Selection Rules.” Management Science 35 (9):1056–65. https://doi.org/10.1287/mnsc.35.9.1056.
  • Fildes, R., S. Ma, and S. Kolassa. 2022. “Retail Forecasting: Research and Practice.” International Journal of Forecasting 38 (4):1283–318. https://doi.org/10.1016/j.ijforecast.2019.06.004.
  • Gastinger, J., S. Nicolas, D. Stepić, M. Schmidt, and A. Schülke. 2021. “A Study on Ensemble Learning for Time Series Forecasting and the Need for Meta-Learning.” Technical Report, arXiv:2104.11475.
  • George, G., E. C. Osinga, D. Lavie, and B. A. Scott. 2016. “Big Data and Data Science Methods for Management Research.” Academy of Management Journal 59 (5):1493–507. https://doi.org/10.5465/amj.2016.4005.
  • Gneiting, T., and A. E. Raftery. 2007. “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association 102 (477):359–78. https://doi.org/10.1198/016214506000001437.
  • Harshman, R. A. 1970. “Foundations of the PARAFAC Procedure: Models and Conditions for an ‘Explanatory’ Multimodal Factor Analysis.” UCLA Working Papers in Phonetics 16:1–84.
  • Hill, C., J. Li, and M. Schneider. 2021. “The Tensor Auto-Regressive Model.” Journal of Forecasting 40 (4):636–52. https://doi.org/10.1002/for.2735.
  • Hitchcock, F. L. 1927. “The Expression of a Tensor or Polyadic as a Sum of Products.” Journal of Mathematics and Physics 6 (1):164–89. https://doi.org/10.1002/sapm192761164.
  • Hoff, P. D. 2011. “Separable Covariance Arrays via the Tucker Product, with Applications to Multivariate Relational Data.” Bayesian Analysis 6 (2):179–96. https://doi.org/10.1214/11-BA606.
  • Hyndman, R. J. 2020. “A Brief History of Forecasting Competitions.” International Journal of Forecasting 36 (1):7–14. https://doi.org/10.1016/j.ijforecast.2019.03.015.
  • Hyndman, R. J., and G. Athanasopoulos. 2018. Forecasting: Principles and Practice. Melbourne: OTexts.
  • Hyndman, R. J., and Y. Khandakar. 2008. “Automatic Time Series Forecasting: The forecast Package for R.” Journal of Statistical Software 27 (3):1–22. https://doi.org/10.18637/jss.v027.i03.
  • Hyndman, R. J., and A. B. Koehler. 2006. “Another Look at Measures of Forecast Accuracy.” International Journal of Forecasting 22 (4):679–88. https://doi.org/10.1016/j.ijforecast.2006.03.001.
  • Hyndman, R. J., A. B. Koehler, R. D. Snyder, and S. Grose. 2002. “A State Space Framework for Automatic Forecasting Using Exponential Smoothing.” International Journal of Forecasting 18 (3):439–54. https://doi.org/10.1016/S0169-2070(01)00110-8.
  • Januschowski, T., J. Gasthaus, Y. Wang, D. Salinas, V. Flunkert, M. Bohlke-Schneider, and L. Callon. 2020. “Criteria for Classifying Forecasting Methods.” International Journal of Forecasting 36 (1):167–77. https://doi.org/10.1016/j.ijforecast.2019.05.008.
  • Kang, Y., R. J. Hyndman, and F. Li. 2020, “GRATIS: GeneRAting TIme Series with Diverse and Controllable Characteristics.” Statistical Analysis and Data Mining 13 (4):354–76. https://doi.org/10.1002/sam.11461.
  • Karlsson Rosenblad, A. 2021. “Accuracy of Automatic Forecasting Methods for Univariate Time Series Data: A Case Study Predicting the Results of the 2018 Swedish General Election Using Decades-Long Series.” Communications in Statistics: Case Studies, Data Analysis and Applications 7 (3):475–93. https://doi.org/10.1080/23737484.2021.1964407.
  • Kiers, H. A. L. 2000. “Towards a Standardized Notation and Terminology in Multiway Analysis.” Journal of Chemometrics 14 (3):105–122. https://doi.org/10.1002/1099-128X(200005/06)14:3<105::AID-CEM582>3.0.CO;2-I.
  • Kitchin, R. 2014. “Big Data, New Epistemologies and Paradigm Shifts.” Big Data & Society 1 (1):1–12. https://doi.org/10.1177/2053951714528481.
  • Kolda, T. G., and B. W. Bader. 2009. “Tensor Decompositions and Applications.” SIAM Review 51 (3):455–500. https://doi.org/10.1137/07070111X.
  • Kossaifi, J., Y. Panagakis, A. Anandkumar, and M. Pantic. 2019. “TensorLy: Tensor Learning in Python.” Journal of Machine Learning Research 20 (26):1–6.
  • Landon, J., and N. D. Singpurwalla. 2008. “Choosing a Coverage Probability for Prediction Intervals.” The American Statistician 62 (2):120–24. https://doi.org/10.1198/000313008X304062.
  • Liberman, M. 2010. “Obituary: Fred Jelinek.” Computational Linguistics 36 (4):595–99. https://doi.org/10.1162/coli_a_00032.
  • Ma, L., S. Qin, and Y. Xia. 2023. “Alteration Detection of Tensor Dependence Structure via Sparsity-Exploited Reranking Algorithm.” Technical Report, arXiv:2310.08798.
  • Ma, S., and R. Fildes. 2021. “Retail Sales Forecasting with Meta-Learning.” European Journal of Operational Research 288 (1):111–128. https://doi.org/10.1016/j.ejor.2020.05.038.
  • Makridakis, S., C. Fry, F. Petropoulos, and E. Spiliotis. 2022a. “The Future of Forecasting Competitions: Design Attributes and Principles.” INFORMS Journal on Data Science 1 (1):96–113. https://doi.org/10.1287/ijds.2021.0003.
  • Makridakis, S., R. J. Hyndman, and F. Petropoulos. 2020. “Forecasting in Social Settings: The State of the Art.” International Journal of Forecasting 36 (1):15–28. https://doi.org/10.1016/j.ijforecast.2019.05.011.
  • Makridakis, S., E. Spiliotis, and V. Assimakopoulos. 2022b. “M5 Accuracy Competition: Results, Findings, and Conclusions.” International Journal of Forecasting 38 (4):1346–64. https://doi.org/10.1016/j.ijforecast.2021.11.013.
  • Makridakis, S., E. Spiliotis, V. Assimakopoulos, Z. Chen, A. Gaba, I. Tsetlin, and R. L. Winkler. 2022c. “The M5 Uncertainty Competition: Results, Findings and Conclusions.” International Journal of Forecasting 38 (4):1365–85. https://doi.org/10.1016/j.ijforecast.2021.10.009.
  • Müller, O., I. Junglas, J. vom Brocke, and S. Debortoli. 2016. “Utilizing Big Data Analytics for Information Systems Research: Challenges, Promises and Guidelines.” European Journal of Information Systems 25 (4):289–302. https://doi.org/10.1057/ejis.2016.2.
  • Nikolopoulos, K., and F. Petropoulos. 2018. “Forecasting Big Data: Does Suboptimality Matter?” Computers & Operations Research 98 (1):322–329. https://doi.org/10.1016/j.cor.2017.05.007.
  • Papalexakis, E. E., C. Faloutsos, and N. D. Sidiropoulos. 2016. “Tensors for Data Mining and Data Fusion: Models, Applications, and Scalable Algorithms.” ACM Transactions on Intelligent Systems and Technology 8 (2):e16. https://doi.org/10.1145/2915921.
  • Petropoulos, F., Y. Grushka-Cockayne, E. Siemsen, and E. Spiliotis. 2022. “Wielding Occam’s Razor: Fast and Frugal Retail Forecasting.” Technical Report, arXiv:2102.13209.
  • Petropoulos, F., S. Makridakis, V. Assimakopoulos, and K. Nikolopoulos. 2014. “‘Horses for Courses’ in Demand Forecasting.” European Journal of Operational Research 237 (1):152–163. https://doi.org/10.1016/j.ejor.2014.02.036.
  • Rabanser, S., O. Shchur, and S. Günnemann. 2017. “Introduction to Tensor Decompositions and their Applications in Machine Learning.” Technical Report, arXiv:1711.10781.
  • Salinas, D., V. Flunkert, J. Gasthaus, and T. Januschowski. 2020. “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” International Journal of Forecasting 36 (3):1181–91. https://doi.org/10.1016/j.ijforecast.2019.07.001.
  • Schosser, J. 2020. “Multivariate Extrapolation: A Tensor-based Approach.” In Operations Research Proceedings 2019, edited by J. S. Neufeld, U. Buscher, R. Lasch, D. Möst, and J. Schönberger, 53–59. Cham: Springer.
  • Schosser, J. 2022a. “Tensor Extrapolation: An Adaptation to Data Sets with Missing Entries.” Journal of Big Data 9 (1):e26. https://doi.org/10.1186/s40537-022-00574-7.
  • Schosser, J. 2022b. “Tensor Extrapolation: Forecasting Large-Scale Relational Data.” Journal of the Operational Research Society 73 (5):969–78. https://doi.org/10.1080/01605682.2021.1892460.
  • Schwartz, R., J. Dodge, N. A. Smith, and O. Etzioni. 2020. “GreenAI.” Communications of the ACM 63 (12):54–63. https://doi.org/10.1145/3381831.
  • Seabold, S., and J. Perktold. 2010. “Statsmodels: Econometric and Statistical Modeling with Python.” In Proceedings of the 9th Python in Science Conference (SCIPY 2010), 57–61.
  • Shah, S. Y., D. Patel, L. Vu, X.-H. Dang, B. Chen, P. Kirchner, H. Samulowitz, D. Wood, G. Bramble, W. M. Gifford, G. Ganapavarapu, R. Vaculin, and P. Zerfos. 2021. “AutoAI-TS: AutoAI for Time Series Forecasting.” In Proceedings of the 2021 International Conference on Management of Data (SIGMOD), 2584–2596. ACM.
  • Shi, Q., J. Yin, J. Cai, A. Cichocki, T. Yokota, L. Chen, M. Yuan, and J. Zeng. 2020. “Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting.” In Proceedings of the 34th Conference on Artificial Intelligence (AAAI), 5758–5766. AAAI. https://doi.org/10.1609/aaai.v34i04.6032.
  • Spiegel, S., J. Clausen, S. Albayrak, and J. Kunegis. 2012. “Link Prediction on Evolving Data Using Tensor Factorization.” In New Frontiers in Applied Data Mining: PAKDD 2011 International Workshops, 100–110. Springer.
  • Syntetos, A. A., J. E. Boylan, and J. D. Croston. 2005. “On the Categorization of Demand Patterns.” Journal of the Operational Research Society 56 (5):495–503. https://doi.org/10.1057/palgrave.jors.2601841.
  • Thompson, N. C., and S. Spanuth. 2021. “The Decline of Computers as a General Purpose Technology.” Communications of the ACM 64 (3):64–72. https://doi.org/10.1145/3430936.
  • Wasserman, S., and K. Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press.
  • Wu, C.-J., R. Raghavendra, U. Gupta, B. Acun, N. Ardalani, K. Maeng, G. Chang, F. A. Behram, J. Huang, C. Bai, M. Gschwind, A. Gupta, M. Ott, A. Melnikov, S. Candido, D. Brooks, G. Chauhan, B. Lee, H.-H. S. Lee, B. Akyildiz, M. Balandat, J. Spisak, R. Jain, M. Rabbat, and K. Hazelwood. 2022. “Sustainable AI: Environmental Implications, Challenges and Opportunities.” In Proceedings of the 5th Conference on Machine Learning and Systems (MLSys), 795–813. Systems and Machine Learning Foundation.
  • Yu, H.-F., N. Rao, and I. S. Dhillon. 2016. “Temporal Regularized Matrix Factorization for High-Dimensional Time Series Prediction.” In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NeurIPS), 847–855. NeurIPS Foundation.
  • Yu, R., S. Zheng, A. Anandkumar, and Y. Yue. 2017. “Long-Term Forecasting Using Tensor-Train RNNs.” Technical report, arXiv:1711.00073.