438
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Low cost network traffic measurement and fast recovery via redundant row subspace-based matrix completion

ORCID Icon, , , &
Article: 2218069 | Received 19 Dec 2022, Accepted 20 May 2023, Published online: 13 Jun 2023

Abstract

Traffic matrices (TMs) are essential for managing networks. Getting the whole TMs is difficult because of the high measurement cost. Several recent studies propose sparse measurement schemes to reduce the cost, which involve taking measurements on only a subset of origin and destination pairs (OD pairs) and inferring data for unmeasured OD pairs through matrix completion. However, existing sparse network measurement schemes suffer from the problems of high computation costs and low recovery quality. This paper investigates the coherence feature of real traffic flow data traces (Abilene and GÈANT). Both data sets are high coherence, with column coherence greater than row coherence. According to the coherence feature of both data sets, we propose our Redundant Row Subspace-based Matrix Completion (RRS-MC). RRS-MC involves several techniques. Firstly, we design an algorithm to identify subspace rows (OD pairs) from historical data. Secondly, based on the identified subspace rows, we design our sampling scheduling algorithm, which takes full measurement samples in subspace rows while taking partial measurement samples in the remaining rows. Moreover, we propose a redundant sampling rule prevent the recovery accuracy decrease caused by the subspace rows varying. Finally, we design a completion algorithm to recover the partially measured rows. We conduct extensive experiments. Results indicate that the proposed scheme is superior to the state-of-the-art sampling and completion scheme in computation costs and recovery accuracy.

1. Introduction

A Traffic matrix (TM) reveals network traffic volumes between OD pairs in a time slot. When viewing TM over multiple time slots, the results form Traffic matrices (TMs). TMs are essential for many network tasks, such as capacity planning, traffic engineering, anomaly detection, and congestion control (Huang et al., Citation2022; Li et al., Citation2023a; Roughan et al., Citation2003; Tang et al., Citation2021; Xie et al., Citation2017Citation2019Citation2016).

The traffic flow exchanged between OD pairs can be measured using flow-based measurement tools like NetFlow (Claise, Citation2004) and sFlow (Wang et al., Citation2004). However, because these tools frequently consume too many CPU and memory resources (Yu et al., Citation2013), obtaining the TMs by measuring the traffic volumes between all OD pairs at all time slots is impractical due to high measurement overhead (Li, Liang et al., Citation2023; Xie et al., Citation2017Citation2021Citation2022). To reduce measurement overhead, matrix completion-based sparse network monitoring attracted much recent interest (Li et al., Citation2023b; Xie, Wang et al., Citation2017). Sparse network monitoring aims to get the entire TM data by taking measurements in only a portion of OD pairs and inferring the un-measured data through matrix completion. According to matrix completion theory, if the target matrix has a low-rank feature, it can be accurately recovered by a subset of samples. Some studies (Roughan et al., Citation2011; Xie et al., Citation2021) have indicated that TMs have a temporal and spatial correlation, implying that TMs meet the low-rank prerequisite for using matrix completion. Although promising, existing matrix completion-based sparse network monitoring suffers from two major problems:

  • Low recovery accuracy: In addition to the low-rank feature, matrix coherence is another essential feature in the matrix completion theory. When the target matrix is highly coherent, which implies that not all entries reveal the same information, most of the information will be omitted if we take samples randomly. Using these random samples to recover the target matrix will lead to poor recovery accuracy. Some recent studies (Roughan et al., Citation2011) have applied matrix completion to infer un-measured data by some random measurement samples. However, these samples may not load enough information to recover unmeasured data accurately.

  • Long processing time: To achieve better recovery accuracy, some studies (Liu & Wang, Citation2020; Xie et al., Citation2015) aim to select better measurement sampling locations following the matrix completion framework. However, to select a better measurement sampling location, such methods usually require multi-round iterations, which results in high computational costs. Besides, existing matrix completion algorithms (e.g. SRMF Roughan et al., Citation2011, SVT Cai et al., Citation2010, ALM Lin et al., Citation2010) involve an iterative training parameters process, which incurs high computational costs too.

Subspace-based matrix completion (Balcan & Zhang, Citation2016; Jin et al., Citation2022; Krishnamurthy & Singh, Citation2013Citation2014; Wan et al., Citation2018) has a completion speed advantage among matrix completion techniques. This scheme only uses simple matrix calculation to recover the entire matrix, not involve multi-round iteration. However, using the subspace-based method for sparse network monitoring faces two problems: (1) For subspace-based matrix completion, we can select some columns to form column subspace or some rows to form row subspace. Which is better for a sparse network monitoring scenario. (2) Even though the completion algorithm is high-speed, identifying the subspace through the adaptive sampling is time-consuming.

Unlike the above sparse network monitoring schemes (Liu & Wang, Citation2020; Xie et al., Citation2015), we model the network traffic traces using a matrix, where the row denotes the OD pairs, and the column denotes the time slot to take advantage of both spatial and temporal information. We study the coherence feature of real network traffic traces. According to the coherence feature, we propose a matrix completion based on the row subspace. Our contributions in this paper can be summarised as follows:

  • We study the coherence feature of two real traffic flow traces (Abilene Zhang et al., Citation2003, and GÈANT Uhlig et al., Citation2006). We show the relationship between matrix numerical value and high coherence and illustrate the negative impact of high column coherence on the column subspace-based sampling and completion algorithm.

  • Based on the coherence feature of network traffic data, we propose Redundant Row Subspace-based Matrix Completion (RRS-MC). RRS-MC constructs a subspace by fully sampling the subspace rows. Then, for each rest of the rows, we take partial sampling and recover it by subspace. We identify subspace rows by training a small amount of historical data. Moreover, we add some redundant samples to obtain subspace robustly to prevent the decreased recovery accuracy caused by data varies.

  • We conduct comprehensive experiments on two real traffic flow traces (Abilene Zhang et al., Citation2003, and GÈANT Uhlig et al., Citation2006). The experimental results show that our method can obtain the whole monitoring data with high recovery quality at a lower measurement cost and has advantages in processing speed.

The rest of this paper is organised as follows. We present notations and basic definitions in Section 2. In Section 3, we introduce the related work. We study the feature of real traffic traces in Section 4. We formulate the problem and show our solution overview in Section 5. In Section 6, we proposed our sampling schedule algorithm and corresponding completion algorithm. We evaluate the performance of the proposed algorithms through extensive experiments in Section 7. Finally, conclude the work in Section 8.

2. Preliminaries

In this section, we give the notations and definitions. Table  shows a list of notations used in this paper.

Table 1. List of notations.

Definition 2.1

Full SVD and Skinny SVD, Liu et al., Citation2012

For an n1×n2 matrix M, its Singular Value Decomposition (SVD) is defined as M=UΣVT, where URn1×n1 and VRn2×n2 are unitary matrices, and ΣRn1×n2 a diagonal matrix with decreasing order elements. i.e. Σ=diag(σ1,σ2,,σr,0,,0), where r is the rank of the matrix and σi is the ith singular value. The diagonal elements are also known as singular values. The SVD defined in this way is called the full SVD.

If only keep the positive σi, the reduced form is called the skinny SVD. For a matrix M of rank r, its skinny SVD is computed by M=UrΣrVrT, where Σr=diag(σ1,σ2,,σr) with {σi}i=1r being positive singular values. More precisely, Ur and Vr are formed by taking the first r columns of U and V, respectively.

Definition 2.2

Leverage Scores, Chen et al., Citation2015

For an n1×n2 real-valued matrix M of rank r. Denoting the skinny SVD of M=UΣVT, then its (normalized) leverage scores are defined as: (1) μi(M):=n1rUei22,i=1,2,,n1(1) (2) νj(M):=n2rVej22,j=1,2,,n2(2) where μi(M) corresponding any row i, νj(M) corresponding any column j, and ei (resp. ej) denotes the i (resp. j)th standard basis vector with appropriate dimension.

Definition 2.3

Coherence Parameter, Candès & Recht, Citation2009; Liu & Li, Citation2016

For an n1×n2 matrix M of rank r whose skinny SVD is given by M=UrΣrVrT, where U Rn1×r and VRr×n2 are orthogonal matrix. Let U and V be the column subspace and row subspace of M. The column-coherence is defined as: (3) μ(U):=maxi=1n1(μi(M)):=n1rmaxi=1n1Uei22(3) The row-coherence is defined as: (4) μ(V):=maxj=1n2(νj(M)):=n2rmaxj=1n2Vej22(4) Where the ei (resp. ej) denotes the i (resp. j)th standard basis vector with appropriate dimension. Column-coherence and row-coherence are bound by 1μ(U)n1/r and 1μ(V)n2/r. Coherence parameter of M is defined as μ(r):=max(μ(U),μ(V)). We can consider the leverage scores as a local version of the coherence parameter.

3. Related work

In this section, we review related work. First, we introduce the existing traffic trace recovery techniques and matrix completion-based sparse network monitoring methods. Then, we describe the subspace-based matrix completion and the corresponding adaptive sampling schemes.

3.1. Traffic data recovery techniques and sparse network monitoring schemes

A series of studies is proposed to deal with missing traffic data. Utilize spatial (Lakhina et al., Citation2004; Zhang et al., Citation2005) or temporal (Barford et al., Citation2002) feature of the traffic data to infer the missing data. Most of the above methods have poor data recovery performance. Utilizing both spatial and temporal features in traffic matrices, SRMF (Roughan et al., Citation2011) proposes the first spatio-temporal model to recover missing data by matrix factorization (MF). Following SRMF, several studies (Y. C. Chen et al., Citation2014; Diao et al., Citation2022; Hu et al., Citation2022; Hu, Hsieh et al., Citation2022; Li et al., Citation2023b; Liang et al., Citation2022) are proposed to recover data. These methods infer the missing data from partial measurement samples. However, such schemes do not consider the quality of the measurement samples. When the measured data does not carry enough information, these methods may suffer from low recovery accuracy.

In order to ensure the recovery performance, i.e. make the measurement samples have enough information to recover the unmeasured data. Several studies adaptively select measurement samples following the matrix completion framework. MC-E2E (Xie et al., Citation2015) proposes a dynamic sampling strategy to determine which OD pairs need to be measured according to some initial random samples. By modeling the traffic matrix with rows representing the origin nodes and columns representing the destination nodes, the main steps of the MC-E2E are as follows: first, apply a random sampling strategy to obtain some initial measurement samples, then use the previous measurement samples to recover the traffic matrix, and adaptively select the measurement locations with larger recovery error, the sampling process iterates until the stop condition is met. The stop condition determines whether the current measurement samples contain enough information to recover the traffic matrix. After obtaining enough measurement samples, MC-E2E uses SVT (Cai et al., Citation2010) to infer the unmeasured data. MC-E2E (Xie et al., Citation2015) has an advantage in terms of recovery accuracy. However, to achieve high recovery quality, MC-E2E requires multi-round iterations to find informative sampling locations, resulting in high computational costs.

Liu and Wang (Citation2020) uses an adaptive sampling algorithm (Y. Chen et al., Citation2014) (called Local Coherence in this paper) for sparse network monitoring. Local Coherence (LC) is proposed to solve high coherent matrix completion by a two-stage adaptive sampling strategy. First, randomly sampling some entries and performing SVD on these samples to get the estimated leverage scores (local version of coherence) of remaining un-sampled data. Second, use the estimated leverage scores to determine further sampling location. Liu and Wang (Citation2020) also uses the stop condition proposed in Xie et al. (Citation2015) to determine whether the current measurement samples are sufficient to recover the matrix. Before the stop condition is satisfied, LC needs to continuously perform SVD to obtain the leverage scores to determine further sampling location, resulting in high computational costs. In addition, LC uses partially measured data to calculate leverage scores which may lead to inaccurate sampling locations and thus affect the recovery quality.

3.2. Subspace-based matrix completion and corresponding adaptive sampling scheme

A novel type of matrix completion algorithm is proposed in recent years (Balcan & Zhang, Citation2016; Jin et al., Citation2022; Krishnamurthy & Singh, Citation2013Citation2014; Wan et al., Citation2018). These methods (called subspace-based matrix completion in this paper) use subspace to fill partially sampled vectors. The rationale of these completion methods is that if a vector belongs to a subspace U, then it can be recovered by U and the sub-samples of the vector.

In detail, the subspace-based methods (Balcan & Zhang, Citation2016; Jin et al., Citation2022; Krishnamurthy & Singh, Citation2013Citation2014; Wan et al., Citation2018) identify some columns to compose the column subspace U, then use U to fill the other partially sampled columns. Given a subspace U, for any column Mt, we sample set Ω entries, denote samples in Mt as MΩt, if MtU, and we can represent Mt=Uαt, where αt is an r-dimensional coefficient. We can solve the following optimization problem to get the coefficient: (5) minαtRr12MΩtUΩαt22(5) Then recover Mt by M^t=Uαt, where αt is the optimal solution of (Equation5), because (Equation5) has a closed-form solution αt=(UΩTUΩ)1UΩTMΩt. Thus we can get the recovery equation: (6) M^t=U(UΩTUΩ)1UΩTMΩt(6) An adaptive sampling strategy is usually used in these schemes to identify the columns that compose the subspace (called subspace columns). Specifically, given a subspace U (initially, it can be ∅), for any column Mt, random sampling |Ω| entries, when |Ω| is enough, we can determining whether sub-samples of Mt belongs to the current subspace U by an estimator res=MΩtPUΩMΩt, where PUΩ=UΩ(UΩTUΩ)1UΩT. In the absence of noise, when res>0 imply that Mt not belonging to U, then adaptive sampling all entries in Mt, and add it to the U. When res = 0, we can recover Mt by Equation (Equation6).

Finding subspace columns using adaptive sampling usually requires the assumption that μ(U) is incoherent (Balcan & Zhang, Citation2016), i.e. low coherence value of μ(U). When μ(U) is incoherent, it implies that each row of the matrix reveals the same information. So, we do not need to consider the sampling location when we take samples for each column. When the |Ω| is large enough, we have sufficient information to judge whether the partially sampled column belongs to the current subspace. If so, we can use it to recover the partially sampled column.

To provide a sparse monitoring scheme with high recovery accuracy and low measurement cost. This paper studies the coherence feature of two real network traffic traces. Based on the coherence feature, we propose RRS-MC, including the measurement and corresponding completion algorithms. Our measurement algorithm only takes full measurement samples on some rows and partial measurement samples for the remaining rows. Then, we can use the proposed completion algorithm to recover unmeasured data. We also demonstrate that the proposed completion algorithm is feasible for sparse network monitoring due to its low computation cost.

4. Empirical study with real traces, and common cases of highly coherent matrices

According to the matrix completion theory, a low-rank feature is necessary for low cost sparse network monitoring. In addition to low rank, matrix coherence is another feature that affects the accurate recovery of network monitoring data. In this section, we conducted the following experimental studies to prove traffic monitoring data's (Abilene Zhang et al., Citation2003 and GÈANT Uhlig et al., Citation2006) low-rank and high-coherence features. Then, we investigate the data structure leading to the matrix's high coherence.

4.1. Low-rank feature

According to Markovsky (Citation2018); Xie et al. (Citation2020), the top k singular values occupy nearly the total energy if a matrix has a good low-rank structure. i.e. i=1kσi2i=1rσi2, where σi represents the ith singular value of the matrix. In this study, we apply SVD on the real traffic traces and check the low-rank feature of the real traffic traces using the total variance captured by the top k singular values, i.e. g(k)=i=1kσi2/i=1rσi2.

Figure  shows the fraction g(k) for both traffic data sets Abilene (Zhang et al., Citation2003) and GÈANT (Uhlig et al., Citation2006). The top ten singular values in both network traffic data capture more than 95% variance, indicating that both data sets have a good low-rank approximation and meet the prerequisite for using matrix completion-based methods.

Figure 1. Good low-rank structure of real traffic traces.

Figure 1. Good low-rank structure of real traffic traces.

4.2. High-coherence feature

In addition to the low-rank feature, another essential feature in matrix completion theory is matrix coherence. The incoherence assumption (i.e. low coherence) is required because of the uniform sampling (Y. Chen et al., Citation2014). When the target matrix is incoherent, informally speaking, each entry of the target matrix reveals the same information. Since each entry has the same information load, we can accurately recover the matrix as long as the smallest required sample number is satisfied. However, this incoherence assumption is unlikely to hold in a real network traffic matrix. The traffic matrix is coherent (i.e. high coherence). When the matrix is coherent, adaptive sampling schemes are usually used instead of random sampling to ensure that the samples have enough information. In the high coherence case, we prefer to select those elements that carry more information to provide enough information to recover the target matrix. In this subsection, we study the coherence feature of both data sets.

According to Definition 2.3, the matrix coherence is the maximum value of the leverage scores. We first study the leverage scores of both data sets. For the monitoring matrix MRN×T, based on Definition 2.2, the leverage scores are related to N, T, and rank r. We take T = N, and r is calculated as the least k that makes g(k) larger than 99%, where g(k) is given in Section 4.1. Figure  shows the leverage scores of both data sets, where the X-axis is the index of rows and columns, and the Y-axis is the leverage score. As shown in Figure , the max value of the leverage scores is 35.292 (Abilene) and 102.2354 (GÈANT), so the coherence value is 35.292 (Abilene) and 102.2354 (GÈANT), respectively. We use a black dashed line to indicate the upper bound of the leverage scores for the two data sets, i.e. N/r.

Figure 2. Leverage score of two real network traffic flow traces.

Figure 2. Leverage score of two real network traffic flow traces.

In addition to the coherence feature of the first MRN×T, where the first M collects the monitoring data from time slot 1 to time slot T, we also study the coherence feature of the subsequent data. With the data varying, the results are shown in Figure . In the figure, the X-axis indicates the starting time slot. The Y-axis indicates the coherence parameters of each MRN×T from varying starting time slots. Combining Figures and , we find that: (1) both data sets are high coherence; (2) column coherence is very high, and row coherence is relatively lower than column coherence.

Figure 3. Coherence feature of two real network traffic flow traces.

Figure 3. Coherence feature of two real network traffic flow traces.

4.3. Cause for the high coherence of traffic traces

The low-rank feature description columns (or rows) lie on a low-dimensional subspace. When the columns (or rows) uniformly lie on the low dimensional subspace, the coherence parameter is proved to be very low (Candès & Recht, Citation2009). However, real-world data always not uniformly lie on a low-dimensional subspace, which causes high coherence of the matrix. To illustrate the reason for the high coherence of the traffic traces, in this subsection, we show the relationship between matrix numerical value and high coherence.

  • Case 1: The underlying subspace is constituted of some tiny subspace. Figure (a) shows a small example for this case. We can see that in a rank-2 matrix in which the first three rows belong to the same row subspace, the last row belongs to another row subspace. This structure is called the mixture of subspace in Balcan and Zhang (Citation2016) and Liu and Li (Citation2016).

  • Case 2: The maximum and minimum values of the data in the matrix differ by several orders of magnitude. Figure (b) shows an example for this case.

  • Case 3: A matrix simultaneously satisfies both data features of Case 1 and Case 2. Figure (c) shows a small example for this case.

Figure 4. Example of high coherent matrices.

Figure 4. Example of high coherent matrices.

The high column coherence feature of network traffic data implies that either one or more OD pairs have much more traffic volume than other OD pairs or that some OD pairs belong to an independent subspace.

5. Problem and solution overview

5.1. Problem formulation

The aim of this paper is to lower network measurement costs while obtaining accurate network monitoring data for all OD pairs. Matrix completion provides a solution to recover a low-rank matrix from a subset of sampled entries. The prerequisite for using matrix completion is that the target matrix satisfies the low-rank constraint. In Section 4.1, we verify the low-rank feature of the traffic traces, thus making the matrix completion reduce the measurement cost feasibly.

As shown in Figure , for low cost network monitoring, our scheme includes two major tasks: (1) a sampling scheduling algorithm that determines locations of measurement samples, and (2) after samples are collected, a matrix completion algorithm recovers (infers) the un-measured (un-sampled) data.

Figure 5. Problem.

Figure 5. Problem.

Subspace-based matrix completion schemes use adaptive sampling to obtain the subspace and then recover partially sampled columns (or rows) through the subspace. Applying subspace-based matrix completion for low cost traffic monitoring faces the following problems:

  • For subspace-based matrix completion, we can use adaptive sampling to find columns to compose column subspace or adaptive finding rows to compose row subspace. In Section 4.2, we verified the coherence feature of the traffic traces. Its column coherence is very high, and row coherence is relatively low, in which case using a column subspace-based adaptive sampling approach may lead to recovery failure or poor recovery accuracy.

    Figure  shows an example of the above case. The given rank-2 matrix with noise is shown in Figure (a), the coherence parameters of the matrix are μ(U)=2, μ(V)=1.582, and we have μ(U)>μ(V). Assuming we take random sampling as shown in Figure (b), we need to use an adaptive sampling strategy to select another column to form the column subspace of the given matrix. As our estimator, we calculate the residual between each partially sampled column and the fully sampled column (the second column). Wang and Singh (Citation2015) and Wang and Singh (Citation2017) use the magnitude of the residuals as a probability to identify which is a subspace column. In this case, the first column has the largest residual and is more likely to be a subspace column. After collecting samples, we use Equation (Equation6) to recover the given matrix, and the recovery result is shown in Figure (c). As shown in Figure , a point value is 9.001 in the original matrix, but in the recovered matrix, it becomes 12, leading to poor recovery accuracy.

  • High computational cost to identify the subspace. The recovery performance of using an adaptive sampling scheme to identify the subspace is better than using a naive random sampling scheme in the high coherence case. However, the high computational cost of the adaptive sampling scheme makes it difficult to meet the speed requirements of subsequent network tasks.

Figure 6. Big recovery error of use the column subspace-based methods in high column-coherence case.

Figure 6. Big recovery error of use the column subspace-based methods in high column-coherence case.

5.2. Solution overview

Since both real traces are high column coherence, the row coherence is relatively low. In this paper, we use row subspace-based matrix completion. Using a column or row subspace-based method depends on the data feature. Specifically, the adaptive sampling scheme first takes random samples and then obtains the subspace by adaptive sampling. To obtain higher recovery accuracy, we prefer a random sampling process on the low coherence side and an adaptive sampling process on the side with higher coherence. When the column coherence is higher than the row coherence, we should use the row subspace-based method.

The solution overview of our sampling schedule and recovery scheme is shown in Figure . Our scheme consists of two stages. Figure (a) shows the training stage. The main goal of this stage is to find rows, i.e. OD pairs, that compose the row subspace of the target network. These rows are named subspace rows in this paper. We use adaptive sampling in historical data to obtain subspace rows and mark the index set of subspace rows.

Figure 7. Solution overview.

Figure 7. Solution overview.

Figure (b) shows the second stage of our scheme, named the measurement and recovery stage. Because of the spatio temporal feature of the real traces, the subspace rows do not change much. We can use the rows marked at the first stage to form the estimated row subspace. In this stage, we take full measurement samples for all subspace rows. Furthermore, we randomly select some redundant rows and add them to the subspace rows set to prevent subspace variation, which will reduce recovery accuracy. For the remaining rows, we only take some measurement samples. After collecting data, our completion algorithm recovers the remaining partially measured rows.

6. Proposed method

6.1. Training stage

The main goal of this stage is to identify the rows that compose the row subspace. We use adaptive sampling to identify subspace rows from the historical data. Adaptive sampling usually requires active data acquisition (Krishnamurthy & Singh, Citation2014), which means that if a row is identified as a subspace row, we need to acquire all the data for this row. Since we have yet to determine which row will be identified as a subspace row in advance, we need to access the entire historical data matrix.

The main process of the training stage is shown in Algorithm 1. Assume the number of subspace rows is k, denote the set of subspace rows as S, and denote the index set of subspace rows as I. Initially, we randomly select a row as a subspace row, add it into S, add its index into I, and get the current row subspace by performing skinny SVD on ST. For each remaining row, we randomly take |Ω| samples. Then, we find other k−1 subspace rows through adaptive sampling, as shown in step 5. Among all other rows in historic data M, we select the row with the largest residual from the current row subspace V^ as the subspace row, where the residual is calculated as res=MtΩPV^ΩMtΩ22, here PV^Ω=VΩ^(V^ΩTV^Ω)1V^ΩT.

The rationale behind selecting the largest residual row as a subspace row is that: rows with large residuals between the current row subspace can hardly be represented by the current row subspace V^. As shown in Figure , assume we take the initial samples as Figure (b) shows, select the third row as a subspace row. Then, we calculate the residual between the rest rows and the current subspace. The largest residual is 0.2015, so we take adaptive sampling on the fourth row, add it into S, and add its index 4 into I. In this example, I=[3,4].

Figure 8. An example of select largest residual from the current row subspace.

Figure 8. An example of select largest residual from the current row subspace.

6.2. Measurement and completion stage

We now describe our measurement and completion algorithm. The pseudo-code is shown in Algorithm 2. After we get the index of trained subspace rows I, we know which OD pairs compose the row subspace. Since the data changes over time, the subspace rows change a little. In order to obtain the estimated subspace more accurately, we designed a redundant sampling rule to ensure the robustness of the recovery accuracy. As shown in step 2, we randomly add α rows index into I. After getting the index set I, we perform the sampling schedule process according to I. Take full measurement samples for the indexed subspace rows (OD pairs). For the rest of the rows, i.e. rows not indexed in I, we only need to take |Ω| measurement samples. After gathering measurement data, we recovered unmeasured data as follows:

  • Denote subspace rows set as S. We calculate the rank r of S and get the estimated row subspace V^ by performing skinny SVD on ST.

  • For the rows that are not indexed in I, the unmeasured data can be recovered by M^t = (V^(V^ΩTV^Ω)1V^ΩTMtΩ)T.

There are three parameters k, α, and |Ω| in our algorithms. These parameters affect the number of samples and the recovery accuracy. Where k and α impact the number of subspace rows. If k and α are too small, the selected subspace can not represent the matrix well, which further compromises the recovery accuracy. If k and α are too large, many rows will be selected as subspace rows, leading to a high sampling cost. |Ω| impacts the number of samples in recovered rows. If |Ω| is too small, we do not have enough information to recover the data for each recovered row, and if |Ω| is too large, it will lead to a high sampling cost too. In Section 7, we will do experiments to set proper parameters.

6.3. Complexity analysis

This subsection investigates our scheme's sample number and computational complexity. Assume that the size of the recovered matrix is MRN×T1. After collecting the measurement samples, we have a set of subspace rows SR(k+α)×T1. To recover the remaining partially measured rows of M, we take |Ω| measurement samples for each row. Thus, the total sample number of our scheme is (k+α)T1+(Nkα)|Ω|.

Next, we study the computation complexity of our scheme. To recover the matrix M, we first calculate the rank r of S, which has a time complexity of O(T1(k+α)2). We then obtain the estimated row subspace V^ of M by performing skinny SVD on ST, which has a complexity of O(T1(k+α)r). Using (V^(V^ΩTV^Ω)1V^ΩTMtΩ)T, we recover the remaining partially measured rows. Since V^RT1×r, V^ΩTRr×|Ω|, and MtΩR|Ω|×1, and we typically set |Ω|>r, the complexity of recovering a partially measured row is O(T1|Ω|r). The total number of recovered rows is N(k+α), so the total complexity of recovering the entire matrix is O(T1(k+α)2+T1(k+α)r+(N(k+α))T1|Ω|r), which can be rewritten as O(NT1|Ω|r), given that N>>(k+α).

7. Performance evaluations

We conduct extensive experiments on two real-world network traffic data sets to evaluate the performance of our RRS-MC scheme. Moreover, to verify the processing time advantage of RRS-MC, we also conduct experiments on a relatively large-scale synthetic data set.

7.1. Data description and prepossessing

Table  shows the data sets used in this study. Data normalization (Aksoy & Haralick, Citation2001) is often applied in data preprocessing to scale the features of data into the range [0,1]. We normalise the Abilene and GÈANT in the following manner: (7) Mi,j=Mi,jmin{Mi,j}max{Mi,j}min{Mi,j}(7) where the min{Mi,j} and max{Mi,j} are the minimum value and maximum value of the data set. This normalization process does not affect our recovery results. After we get the recovery results, we first perform the inverse of the normalization operation on the recovery results and then calculate the recovered error ratio.

Table 2. Data description.

7.2. Evaluation metrics and experimental setup

We use the proposed and compared schemes in the experiment to recover the network traffic data from partial measurement samples. To evaluate the performance of different schemes, we use three evaluation metrics as follows:

Error ratio(inferred): We calculate the error ratio of recovered data by comparing it with the raw data, which is calculated as follows: (8) Error ratio(inferred)=(i,j)Ω¯(Mi,jM^i,j)2(i,j)Ω¯(Mi,j)2(8) where M^i,j and Mi,j denote the recovered data and the raw data at the location (i,j) of M, respectively. Ω¯ denotes the un-sampled element indices set. Error ratio(inferred) measures the relative error between the values recovered from the completion algorithm and the raw data.

The processing time of the scheme consists of two parts: the sampling scheduling time and the recovery time for using the completion algorithm. Thus we use the following metrics to evaluate the processing time of all schemes:

Sampling time: to evaluate the speed of the different sampling algorithms, we use the metric Sampling time to measure the average number of seconds to perform the sampling schedule task.

Recovery time: to evaluate the speed of the different completion algorithms, we use the metric Recovery time to measure the average number of seconds to perform the data recovery task.

We implement our's and comparison schemes using Matlab. The experiments are run on a laptop equipped with CPU Intel(R) Core(TM) i7-8750H [email protected], and 24GB memory.

7.3. Experiment results

7.3.1. Parameter impacts

According to Algorithm 2, the parameter |Ω| indicates how many elements need to be measured in each recovered row. The index set of subspace rows I is made by two parts, a training part that identifies k subspace rows by Algorithm 1, and a redundancy part that randomly selects α redundant subspace rows. These parameters affect the sample ratio and the error ratio. Either k or α increase will increase the number of subspace rows. Therefore, for visualization purposes, we fixed the redundancy parameter α=0.03N, which means we take an approximate 3% redundant sample ratio for each data set.

We use T = N time slots data as the training data to get the index of subspace rows and use T time slots data following the training data to investigate how the |Ω| and k impact the sample ratio and recovery accuracy. According to the data sets, the scale of training data is listed as follows. We set T = N = 144 in Abilene, T = N = 529 in GÈANT, and T = N = 3000 in Synthetic, respectively.

Figure  draws the sample ratio and error ratio by varying k and |Ω|. From Figure , we can see that (1) For the parameter k, initially, the recovery performance becomes better with the increase of k. After k reaches a value, the recovery performance becomes stable. It is because when k reaches a value, we have already obtained enough subspace information to recover the matrix, further increasing k will only affect the recovery accuracy a little. (2) For the parameter |Ω|, its impact on recovery performance is similar to parameter k, when |Ω| reaches a value, the recovery performance becomes stable. For example, in the GÈANT data set, the recovery accuracy when we take |Ω|=0.1T is similar to we take |Ω|=0.15T, but the former has 4% less sample ratio than the latter.

Figure 9. Parameter impacts.

Figure 9. Parameter impacts.

As shown in Figure , we select the parameters with the lowest sample ratio for each data set when the error ratio is close to 0.2. Hence, for Abilene, we set k = 7, |Ω|=0.25T, for GÈANT, we set k = 7, |Ω|=0.1T, for Synthetic, we set k = 10, |Ω|=0.03T, respectively. We use the corresponding sample ratio and error ratio for the following experiments.

7.3.2. Performance comparison with baseline

RRS-MC includes a sampling and completion algorithm. In order to conduct a comprehensive comparison, the compared schemes should also include sampling and completion processes. We compare our sampling and completion algorithms with the following baselines:

  • SVT-E2E: performs E2E (Xie et al., Citation2015) sampling algorithm, then use SVT (Cai et al., Citation2010) based matrix completion algorithm to recover the matrix.

  • ALM-Local Coherence: performs local coherence (Y. Chen et al., Citation2014) sampling algorithm, then use ALM (Augmented Lagrangian Method) (Lin et al., Citation2010) based matrix completion algorithm to recover the matrix.

  • SRMF-Random: performs random sampling, then use SRMF (Roughan et al., Citation2011) as the completion algorithm to recover the matrix.

Figure  shows the recovery accuracy for different algorithms, where X-axis is the sample ratio, and Y-axis is the error ratio. To investigate the recovery performance of the compared schemes, we raise the sample ratio until the error ratio is similar to ours. As shown in Figure , among all implemented algorithms, RRS-MC can achieve the low error ratio (0.2) with the least sample ratio, only needing 35% ratio of sample for Abilene, 13% for GÈANT, and 6% for Synthetic.

Figure 10. Recover error for all schemes.

Figure 10. Recover error for all schemes.

We also draw Figure  to show the sample number needed to reach the same recovered error ratio (0.2). As a reference, we also show the total measurements in T time slots data denoted as “All” in Figure .

Figure 11. Sample number to achieve the same recovery accuracy.

Figure 11. Sample number to achieve the same recovery accuracy.

We draw Figures  and  to show the sampling time and recovery time to achieve the same recovery error ratio in T time slots data. Both E2E and Local-coherence sampling algorithms require multiple rounds of sample selection. In each round, the E2E algorithm needs to use the previous round results to select current round samples. The Local-coherence algorithm needs to use SVD to determine the sampling location. Both algorithms involve complex matrix calculations. For a fair comparison, we take the sampling time of E2E and Local-coherence as the last round iteration time. For RRS-MC, after we train the index of subspace rows I, we take full samples for the indexed rows and randomly take |Ω| samples for the remaining rows. Since I can be trained in advance, the sampling time of RRS-MC is close to the random sampling algorithm. Our sampling algorithm only takes 0.018s in Abilene, 0.085s in GÈANT, and 0.202s in Synthetic, respectively.

Figure 12. Sampling time to achieve same recovery accuracy.

Figure 12. Sampling time to achieve same recovery accuracy.

Figure 13. Recovery time to achieve same recovery accuracy.

Figure 13. Recovery time to achieve same recovery accuracy.

After gathering measurement samples, we use different matrix completion algorithms to recover the un-sampled data. RRS-MC perform skinny SVD once to obtain the row subspace. For each recovered row, it only performs a simple matrix calculation to recover. Compared with other completion algorithms involving iteration, RRS-MC has obvious advantages in recovery speed. It only takes 0.021s in Abilene, 0.1s in GÈANT, and 4.01s in Synthetic, respectively.

7.3.3. Performance comparison with column-based scheme

Besides comparing with the above dynamic sampling and recovery schemes, we also compare RRS-MC with the existing column-based scheme and a random scheme to show the negative impact of high column coherence and the effectiveness of RRS-MC. We draw the CDF (cumulative distribution function) of the error ratio in Figure . Each point in Figure represents a result of the sampling and recovery process. In this experiment, the sample ratio of the three schemes is the same. For each scheme, we have run one hundred times.

Figure 14. Compare with column-based scheme.

Figure 14. Compare with column-based scheme.

As expected, compared with the randomly selected row subspace algorithm (termed random row space), RRS-MC has advantages in terms of accuracy and stability. To illustrate, for the GÈANT data set (Figure (b)), the recovery error of RRS-MC around 0.2, while the recovery error of the random row space is between 0.2 and 0.6 with a large span, which indicates the subspace identified by our algorithm is precise and can recover the data accurately. RRS-MC has an advantage in recovery accuracy compared with the existing column subspace-based algorithm.

7.3.4. Performance comparison with non-redundant scheme

In this subsection, we verify the effectiveness of redundant sampling rule by comparing RRS-MC with a non-redundant scheme (termed RS-MC). For RS-MC, we only use Algorithm 1 to identify the subspace. In this experiment, we set the parameters of both schemes as the results of Section 7.3.1, for each data set, we set the redundancy parameter α=0.03N, which means the redundant sample ratio is approximate 3% for each data set. We use 5T time slot data following the training data to verify the recovery accuracy for both schemes.

Figure  shows the results, where X-axis is a different starting time slot following the training data, and Y-axis is the error ratio. From Figure , we can see that (1) The recovery error of RS-MC tends to increase with time in the real data sets, which indicates that the subspace does change with varying times. (2) The recovery error of RRS-MC for every T time slots are approximately 0.2, which indicates that the estimated subspace from the combination of training by historical data and redundant sampling can robustly recover data continuously. (3) Since the subspace on the synthetic data set does not change, the trained subspace rows can recover the matrix well. Redundant sampling does not affect the recovery accuracy.

Figure 15. Compared with non-redundant scheme.

Figure 15. Compared with non-redundant scheme.

8. Conclusion

In this work, we focus on low cost network traffic monitoring. By studying real network traffic traces, we find that column coherence is larger than row coherence in most cases. Based on this coherence feature, we propose our RRS-MC, which constructs the subspace by taking full measurements in subspace rows. The remaining rows only take partial measurements and recover unmeasured data through the subspace. Accurately monitoring network traffic at a low cost can benefit network providers. For example, service providers can use our scheme to monitor network traffic and optimise their network infrastructure to serve their customers better. Extensive experiments on two real data sets and a synthetic data set show that our scheme ensures the high recovery quality of network measurement data with low measurement costs.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

The work was supported by the National Natural Science Foundation of China [grant numbers 61972144 and 62025201.

References

  • Aksoy, S., & Haralick, R. M. (2001). Feature normalization and likelihood-based similarity measures for image retrieval. Pattern Recognition Letters, 22(5), 563–582. https://doi.org/10.1016/S0167-8655(00)00112-4
  • Balcan, M. F. F., & Zhang, H. (2016). Noise-tolerant life-long matrix completion via adaptive sampling. In Advances in neural information processing systems (pp. 2955–2963).
  • Barford, P., Kline, J., Plonka, D., & Ron, A. (2002). A signal analysis of network traffic anomalies. In Proceedings of the 2nd acm sigcomm workshop on internet measurment (pp. 71–82). ACM Press.
  • Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982. https://doi.org/10.1137/080738970
  • Candès, E. J., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717–772. https://doi.org/10.1007/s10208-009-9045-5
  • Chen, Y., Bhojanapalli, S., Sanghavi, S., & Ward, R. (2014). Coherent matrix completion. In International conference on machine learning (pp. 674–682). JMLR.org.
  • Chen, Y., Bhojanapalli, S., Sanghavi, S., & Ward, R. (2015). Completing any low-rank matrix, provably. The Journal of Machine Learning Research, 16(1), 2999–3034. https://dl.acm.org/doi/10.5555/2789272.2912096
  • Chen, Y. C., Qiu, L., Zhang, Y., Xue, G., & Hu, Z. (2014). Robust network compressive sensing. In Proceedings of the 20th annual international conference on mobile computing and networking (pp. 545–556). ACM.
  • Claise, B. (2004). Cisco systems netflow services export version 9. RFC 3954. Internet Engineering Task Force.
  • Diao, C., Zhang, D., Liang, W., Li, K. C., Hong, Y., & Gaudiot, J. L. (2022). A novel spatial-temporal multi-scale alignment graph neural network security model for vehicles prediction. IEEE Transactions on Intelligent Transportation Systems, 24(1), 904–914. https://doi.org/10.1109/TITS.2022.3140229
  • Hu, N., Zhang, D., Xie, K., Liang, W., Diao, C., & Li, K. C. (2022). Multi-range bidirectional mask graph convolution based GRU networks for traffic prediction. Journal of Systems Architecture, 133, 102775. https://doi.org/10.1016/j.sysarc.2022.102775
  • Hu, N., Zhang, D., Xie, K., Liang, W., & M. Y. Hsieh (2022). Graph learning-based spatial-temporal graph convolutional neural networks for traffic forecasting. Connection Science, 34(1), 429–448. https://doi.org/10.1080/09540091.2021.2006607
  • Huang, W., Xie, K., & Li, J. (2022). A novel sequence tensor recovery algorithm for quick and accurate anomaly detection. IEEE Transactions on Network Science and Engineering, 9(5), 3531–3545. https://doi.org/10.1109/TNSE.2022.3189365
  • Jin, K., Xie, K., Wang, X., Tian, J., Xie, G., Wen, J., & Li, K. (2022). Low cost online network traffic measurement with subspace-based matrix completion. IEEE Transactions on Network Science and Engineering, 10(1), 53–67. https://doi.org/10.1109/TNSE.2022.3201624
  • Krishnamurthy, A., & Singh, A. (2013). Low-rank matrix and tensor completion via adaptive sampling. In Advances in neural information processing systems (pp. 836–844).
  • Krishnamurthy, A., & Singh, A. (2014). On the power of adaptivity in matrix completion and approximation. preprint arXiv:1407.3619.
  • Lakhina, A., Crovella, M., & Diot, C. (2004). Diagnosing network-wide traffic anomalies. ACM SIGCOMM Computer Communication Review, 34(4), 219–230. https://doi.org/10.1145/1030194.1015492
  • Li, X., Xie, K., Wang, X., Xie, G., Li, K., Cao, J., & Wen, J. (2023a). Neighbor graph based tensor recovery for accurate internet anomaly detection. IEEE Transactions on Parallel and Distributed Systems, 34(2), 655–674. https://doi.org/10.1109/TPDS.2022.3227570
  • Li, X., Xie, K., Wang, X., Xie, G., Li, K., Cao, J., & Wen, J. (2023b). Tripartite graph aided tensor completion for sparse network measurement. IEEE Transactions on Parallel and Distributed Systems, 34(1), 48–62. https://doi.org/10.1109/TPDS.2022.3213259
  • Li, Y., Liang, W., Xie, K., Zhang, D., Xie, S., & Li, K. C. (2023). LightNestle: quick and accurate neural sequential tensor completion via meta learning. In IEEE Infocom 2023-ieee conference on computer communications.
  • Liang, W., Li, Y., Xie, K., Zhang, D., Li, K. C., Souri, A., & Li, K. (2022). Spatial-temporal aware inductive graph neural network for C-ITS data recovery. IEEE Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/tits.2022.3156266
  • Lin, Z., Chen, M., & Ma, Y. (2010). The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. preprint arXiv:1009.5055.
  • Liu, G., & Li, P. (2016). Low-rank matrix completion in the presence of high coherence. IEEE Transactions on Signal Processing, 64(21), 5623–5633. https://doi.org/10.1109/TSP.2016.2586753
  • Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., & Ma, Y. (2012). Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 171–184. https://doi.org/10.1109/TPAMI.2012.88
  • Liu, S., & Wang, Q. (2020). Adaptive coherent sampling for network delay measurement. In ICC 2020-2020 ieee international conference on communications (icc) (pp. 1–6). IEEE.
  • Markovsky, I. (2018). Low-rank approximation: algorithms, implementation, applications. Springer.
  • Roughan, M., Thorup, M., & Zhang, Y. (2003). Traffic engineering with estimated traffic matrices. In Proceedings of the 3rd ACM sigcomm conference on internet measurement (pp. 248–258). ACM Press.
  • Roughan, M., Zhang, Y., Willinger, W., & Qiu, L. (2011). Spatio-temporal compressive sensing and internet traffic matrices (extended version). IEEE/ACM Transactions on Networking, 20(3), 662–676. https://doi.org/10.1109/TNET.2011.2169424
  • Tang, B., Tang, M., Xia, Y., & Hsieh, M. Y. (2021). Composition pattern-aware web service recommendation based on depth factorisation machine. Connection Science, 33(4), 870–890. https://doi.org/10.1080/09540091.2021.1911933
  • Uhlig, S., Quoitin, B., Lepropre, J., & Balon, S. (2006). Providing public intradomain traffic matrices to the research community. ACM SIGCOMM Computer Communication Review, 36(1), 83–86. https://doi.org/10.1145/1111322.1111341
  • Wan, Y., Yi, J., & Zhang, L. (2018). Matrix completion from non-uniformly sampled entries. preprint arXiv:1806.10308.
  • Wang, M., Li, B., & Li, Z. (2004). sFlow: towards resource-efficient and agile service federation in service overlay networks. In 24th International conference on distributed computing systems, 2004. proceedings (pp. 628–635). IEEE.
  • Wang, Y., & Singh, A. (2015). Column subset selection with missing data via active sampling. In Artificial intelligence and statistics (pp. 1033–1041). JMLR.org.
  • Wang, Y., & Singh, A. (2017). Provably correct algorithms for matrix column subset selection with selectively sampled data. The Journal of Machine Learning Research, 18(1), 5699–5740.
  • Xie, K., Chen, Y., Wang, X., Xie, G., Cao, J., Wen, J., & Sun, J. (2020). Accurate and fast recovery of network monitoring data with GPU-Accelerated tensor completion. IEEE/ACM Transactions on Networking, 28(4), 1601–1614. https://doi.org/10.1109/TNET.90
  • Xie, K., Li, X., Wang, X., Xie, G., Wen, J., Cao, J., & Zhang, D. (2017). Fast tensor factorization for accurate internet anomaly detection. IEEE/ACM Transactions on Networking, 25(6), 3794–3807. https://doi.org/10.1109/TNET.2017.2761704
  • Xie, K., Li, X., Wang, X., Xie, G., Wen, J., & Zhang, D. (2019). Active sparse mobile crowd sensing based on matrix completion. In Proceedings of the 2019 international conference on management of data (pp. 195–210). ACM.
  • Xie, K., Ning, X., Wang, X., Xie, D., Cao, J., Xie, G., & Wen, J. (2016). Recover corrupted data in sensor networks: a matrix completion solution. IEEE Transactions on Mobile Computing, 16(5), 1434–1448. https://doi.org/10.1109/TMC.2016.2595569
  • Xie, K., Peng, C., Wang, X., Xie, G., & Wen, J. (2017). Accurate recovery of internet traffic data under dynamic measurements. In IEEE infocom 2017-ieee conference on computer communications (pp. 1–9). IEEE.
  • Xie, K., Tian, J., Xie, G., Zhang, G., & Zhang, D. (2021). Low cost sparse network monitoring based on block matrix completion. In IEEE infocom 2021-ieee conference on computer communications (pp. 1–10). IEEE.
  • Xie, K., Wang, L., Wang, X., Xie, G., & Wen, J. (2017). Low cost and high accuracy data gathering in WSNs with matrix completion. IEEE Transactions on Mobile Computing, 17(7), 1595–1608. https://doi.org/10.1109/TMC.7755
  • Xie, K., Wang, L., Wang, X., Xie, G., Zhang, G., Xie, D., & Wen, J. (2015). Sequential and adaptive sampling for matrix completion in network monitoring systems. In 2015 IEEE conference on computer communications (infocom) (pp. 2443–2451). IEEE.
  • Xie, K., Xie, R., Wang, X., Xie, G., Zhang, D., & Wen, J. (2022). NMMF-Stream: a fast and accurate stream-processing scheme for network monitoring data recovery. In IEEE infocom 2022-ieee conference on computer communications (pp. 2218–2227). IEEE.
  • Yu, M., Jose, L., & Miao, R. (2013). Software defined traffic measurement with opensketch. In 10th USENIX symposium on networked systems design and implementation (NSDI 13) (pp. 29–42). USENIX Association.
  • Zhang, Y., Roughan, M., Duffield, N., & Greenberg, A. (2003). Fast accurate computation of large-scale IP traffic matrices from link loads. ACM SIGMETRICS Performance Evaluation Review, 31(1), 206–217. https://doi.org/10.1145/885651.781053
  • Zhang, Y., Roughan, M., Lund, C., & Donoho, D. L. (2005). Estimating point-to-point and point-to-multipoint traffic matrices: an information-theoretic approach. IEEE/ACM Transactions on networking, 13(5), 947–960. https://doi.org/10.1109/TNET.2005.857115