Abstract
The hidden Markov model (HMM), used with Gaussian Process (GP) as an emission model, has been widely used to model sequential data in complex form. This study introduces the hybrid Bayesian HMM with GP emission using SM kernel (HMM-GPSM) to estimate the hidden state of each time-series observation, that is, sequentially observed from a single channel. We then propose a scalable inference method to train the HMM-GPSM using large-scale sequences of time-series dataset that has (1) a large number of sequences for state transitions and (2) a large number of data points in a time-series observation for each hidden state. For state transitions with a large number of sequences, we employ stochastic variational inference (SVI) to update the parameters of HMM-GPSM efficiently. Also, for each time-series observation that has a large number of data points, we propose the approximate GP emission using the Random Fourier Feature (RFF), which is constructed by using the spectral points that are sampled from the spectral density of SM kernel. We propose the efficient inference of the kernel hyperparameters of the approximate GP emission and corresponding HMM-GPSM. Specifically, we derive the training loss, that is, the evidence lower bound of the HMM-GPSM that can be scalably computed for a large number of time-series observations by employing the regularized lower bound of GP emission likelihood with KL divergence. The proposed methods can be used together to train HMM-GPSM with the sequential time-series dataset that contains both (1) and (2). We validate the proposed method on the synthetic and real datasets using the clustering accuracy, marginal likelihood, and training time as the performance metrics.
Supplemental Materials
Derivation: The attached pdf file provides the full derivations of the equations related to the approximation procedure.
Code and dataset: The attached zip file includes the implementation for this work and the datasets used for experiments. Also, these are available on https://github.com/becre2021/abinferhmmgp, that describes some examples (.jupyter) to run the algorithms.
Acknowledgments
This research was supported by the Korea Agency for Infrastructure Technology Advancement and funded by the Ministry of Land, Infrastructure, and Transport (No. 21PIYR-B153277-03). We also thank the editor and anonymous reviewers for their valuable comments that improved the presentation of the article.