3,375
Views
1
CrossRef citations to date
0
Altmetric
Book Review

Time Series Clustering and Classification

by Elizabeth Ann Maharaj, Pierpaolo D'Urso, and Jorge Caiado. Boca Raton, FL: Chapman & Hall/CRC Press, 2019, xv+228 pp., $174.95(H), ISBN: 978-1-49-877321-8.

As we enter the big data era, the amount of time-series data is growing especially fast, originating from various sources like web traffic, healthcare sensors, etc. To understand how these time series connect or to identify anomalies in datasets, a fundamental question is: How do we cluster or classify these time series? Good clustering or classification results should provide useful insights from the dataset and help people make critical decisions. The authors of this book have more than 20 years of experience on the topic of time series clustering and classification. They consolidate many important methods and algorithms commonly used in time series clustering and classification practices published by various scientific journals. In addition, they provide Matlab and R code and corresponding datasets to reproduce the examples in the book.

The book starts with a brief introduction and short literature survey of the field. A rich set of examples is provided to illustrate abstract concepts. Besides talking about canonical clustering methods such as hierarchical clustering and nonhierarchical methods (c-means, c-medois) (Chapter 3), authors also introduce a fuzzy clustering technique which allows objects to belong to more than one cluster with different degrees of membership (Chapter 4). This method provides a natural view of clusters such that more granular decisions based on clustering results could be made. For example, clustering European countries fuzzily using climate time series data implies that Poland is similar to both Germany and Finland, but more similar to Germany. While Germany and Finland are in separate clusters with different climate conditions, it would be misleading to conclude that Poland has the same climate as that of Germany or Finland. The authors also provide validity criteria for different clustering methods that are important in real-world applications (Chapters 3 and 4).

Chapter 5 introduces several distance measures of time series observations. This chapter presents dynamic time warping techniques which could be used for clustering as well. Another idea put forward in this book is to cluster time series by extracting features from time series. However, features should be robust and resilient to the noises or anomalies. The author introduces several time series features such as autocorrelation, partial autocorrelation, quantile autocovariances (QAC), and variance ratios. Within these features, it is shown that QAC has greater robustness against heavy tails, dependencies in the extremes and changes in conditional shapes. Data science practitioners could use QAC to cluster time series with similar fluctuation patterns, even if these time series have outliers. The authors introduce frequency domain features such as spectral ordinates which provide information about cyclic patterns and periodicity. To handle time series with unequal lengths and frequencies, the book introduces three different statistics to overcome these constraints. In practice, time series with unequal lengths and frequencies are common, so these approaches could provide useful tools to detect certain cyclic patterns of time series for data scientists.

While Chapter 6 provides many useful features to quantify the properties of time series, it lacks a discussion of recent advances in deep learning such as variational autoencoder (VAE). Recently, deep learning methods gained popularity in large-scale and high-dimensional time series clustering practices (Fawaz et al. [(Citation2019)]). Chapter 7 discusses how to cluster time series using model metrics or model parameters by fitting different time series models. The authors provide an insightful discussion about clustering time series based on forecasting output. This method is applicable to many unsupervised time series anomaly detection tasks because data scientists could determine data points as anomalies if they are grouped into a rare cluster. In Chapter 8, the authors discuss other clustering methods including hidden Markov models, support vector clustering, and self-organizing maps. Chapters 9 and 10 cover the topic of classification of time series with and without features (classification trees, Gaussian mixture models, Bayesian approach, etc.).

This book covers most classical and common techniques for time series clustering and classification. It consolidates different methods into an extensive coherent framework. This makes the book a good reference for students and researchers. Nevertheless, the book lacks the coverage of recent research topics in machine learning related to time series feature engineering and time series classification. It would be great if the book could include some emerging research topics such as deep learning for time series classification in a future edition.

Ming Chen
Amazon.com

Reference

  • Fawaz, H. I., Forestier, G., Weber, J., Idoumghar, L., and Muller, P. A. (2019), “Deep Learning for Time Series Classification: A Review,” Data Mining and Knowledge Discovery, 33, 917–963. DOI: 10.1007/s10618-019-00619-1.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.