Abstract
With the advancement of sensing technologies, sensor data collected over time have become more useful for detecting anomalies in underlying processes and systems. Sensor data are often affected by contextual variables, such as equipment settings, and can have different patterns, even in normal states depending on the contextual variables. Motivated by this problem, we propose a contextual anomaly detection method for multivariate time series data. We first build a prediction model using training data consisting of only normal observations, and then perform anomaly detection based on the prediction errors for future observations. The prediction model is based on a long short-term memory (LSTM) network that can flexibly model complex relationships between variables as well as temporal correlations between successive time points using the high expressive power of deep recurrent neural networks. In particular, to incorporate the contextual information while ensuring that it does not propagate over time but affects the response data only at specific target time points, we extend the standard LSTM by adding a layer for the contextual variables separately for each time step. The performance of the proposed method was verified with several open-source datasets and a real dataset from a global tire company.
About the authors
Hyojoong Kim received a B.S. degree in industrial engineering from Hanyang University and M.S. and Ph.D. degrees in industrial and systems engineering from KAIST. His research interests include machine learning and applied statistics.
Heeyoung Kim received a B.S. degree in industrial engineering from KAIST, M.S. degrees in industrial engineering and statistics from KAIST and the Georgia Institute of Technology, respectively, and a Ph.D. degree in industrial engineering from the Georgia Institute of Technology. She is an associate professor with the Department of Industrial and Systems Engineering, KAIST. She was a Senior Member of Technical Staff with AT&T Laboratories. Her research interests include applied statistics and machine learning.
Data availability statement
The datasets in Section 5.1 are available at https://www.kaggle.com/kyanyoga/sample-sales-data and https://archive.ics.uci.edu/ml/datasets/CalIt2+Building+People+Counts, and the dataset in Section 5.2 is not publicly available due to confidentiality.