Full article: Bayesian Model Search for Nonstationary Periodic Time Series

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We propose a novel Bayesian methodology for analyzing nonstationary time series that exhibit oscillatory behavior. We approximate the time series using a piecewise oscillatory model with unknown periodicities, where our goal is to estimate the change-points while simultaneously identifying the potentially changing periodicities in the data. Our proposed methodology is based on a trans-dimensional Markov chain Monte Carlo algorithm that simultaneously updates the change-points and the periodicities relevant to any segment between them. We show that the proposed methodology successfully identifies time changing oscillatory behavior in two applications which are relevant to e-Health and sleep research, namely the occurrence of ultradian oscillations in human skin temperature during the time of night rest, and the detection of instances of sleep apnea in plethysmographic respiratory traces. Supplementary materials for this article are available online.

KEYWORDS:

1 Introduction

Identifying the periodicities present in a cyclical phenomenon allows us to gain insight into the sources of variability that drive the phenomenon. For example, respiratory traces obtained from a plethysmograph used on rodents in experimental sleep apnea research exhibit many abrupt changes in their periodic components as the rat naturally changes their breathing pattern in the course of its sleep-wake activities (Han et al. Citation2002; Nakamura, Fukuda, and Kuwaki Citation2003). Similarly, human temperature, as measured by a wearable sensing device over several days at relatively high temporal resolution, may be subject to a different periodic behavior during the night when the individual transitions between ultradian sleep stages (Carskadon and Dement Citation2005; Komarzynski et al. Citation2018). While the theory and methods for analyzing the periodicities of time series data whenever the time series is stationary are relatively well-developed, the task of modelling time series that show regime shifts in periodicity, amplitude, and phase remain challenging because the timing of changes and the relevant periodicities are usually unknown.

There have been many developments for modeling stationary oscillatory time series data. Rife and Boorstyn (Citation1976) and Stoica et al. (Citation1989) addressed the problem of estimating the frequencies, phases, and amplitudes of sinusoidal signals under the assumption of a known number of sinusoids, where inference is based on maximum likelihood frequency estimators. These models, however, require very long time series and a large separation in the frequencies that drive the process, which will not always be the case in practice (Djuric Citation1996; Andrieu and Doucet Citation1999). Quinn (Citation1989), Yau and Bresler (Citation1993), and Zhang and Wong (Citation1993) tackled the problem of model selection on the number of sinusoidal signals by employing the Akaike information criterion (Akaike Citation1974) and the minimum description length principle (Rissanen Citation1978). Djuric (Citation1996) showed that these procedures tend to estimate a wrong number of components when the sample size is small and the signal-to-noise ratio is low.

Bayesian approaches to modeling stationary oscillatory signals were explored for the first time by Bretthorst (Citation1988, Citation1990) with applications to nuclear magnetic resonance spectroscopy. Dou and Hodgson (Citation1995, Citation1996) presented a Bayesian approach that uses a Gibbs sampler to identify multiple frequencies that drive the signal. Their method required the number of frequencies to be fixed in advance, and model selection was achieved by choosing the most probable model based on the estimation of the parameters for all possible models. Bayesian model selection for stationary oscillatory signals based on posterior model probabilities were also investigated by Djuric (Citation1996). Andrieu and Doucet (Citation1999) introduced a more efficient reversible-jump Markov chain Monte Carlo (MCMC) method (Green Citation1995) that jointly tackles model selection and parameter estimation for an unknown number of stationary sinusoidal signals and avoids the computationally expensive numerical optimization of Dou and Hodgson (Citation1995, Citation1996) by sampling the frequencies one-at-time via Metropolis–Hastings (M-H) steps. To the best of our knowledge, currently there is no extension of this methodology to analyze nonstationary oscillatory signals.

A formal statistical modeling framework for a specific class of nonstationary time series data, called locally stationary time series, was developed by Dahlhaus (Citation1997). Extending this framework to a Bayesian setting, Rosen, Stoffer, and Wood (Citation2009) proposed an approach to model the log of the time-varying spectral density using a mixture of smoothing splines. Rosen, Wood, and Stoffer (Citation2012) improved on this by splitting the time series into an unknown but finite number of segments of variable lengths, thereby avoiding the need to preselect partitions, and to estimate the time-varying spectral density using a fixed number of smoothing splines. For a given partition of the time series, the likelihood function is approximated via a product of local Whittle likelihoods (Whittle Citation1957). The methodology was developed using a Bayesian framework and is based on the assumption that, conditional on the position and number of partitions, the time series are piecewise stationary, and the underlying spectral density for each partition is smooth over frequencies. However, exploratory analyses of the time series in both of our case studies revealed spectral densities with very sharp peaks, often at several nearby frequencies, thus invalidating the assumption that the spectral density is smooth over frequencies. In addition, the frequency location of these sharp peaks changed over time.

In this article, we propose a novel Bayesian methodology for modeling oscillatory data that show regime shifts in periodicity, amplitude and phase. In contrast to previous work our approach does not require prespecifying the number of regimes or the order of the model within a regime. We assume that, conditional on the position and number of change-points, the time series can be approximated by a piecewise changing sinusoidal regression model. The timing and number of changes are unknown, along with the number and values of relevant periodicities in each segment. We develop a reversible jump MCMC technique that jointly explores the parameter space of the change-points and sub-models for all segments.

The article is organized as follows. Sections 2 and 3 present the model, the prior specifications, and the general structure of our Bayesian approach. Sections 4 and 5 provide a detailed explanation of our sampling scheme and simulation studies to demonstrate the performance of our approach. In Section 6, we illustrate the use of our methodology in two data-rich scenarios related to sleep, circadian rhythm, and e-Health research, namely the identification of the spectral properties of experimental breathing traces arising in sleep apnea research, and the analysis of human temperature data measured over several days by a wearable sensor. We conclude and discuss our current findings in Section 7.

2 The Model

Consider a time series realization $y_{1}, \dots, y_{n}$ whose periodic behavior may change at k unknown time-points $s_{(k)} = (s_{1}, \dots, s_{k})'$ where k is also unknown. Assume that in each sub-interval $I_{j} = [s_{j - 1}, s_{j})$ there are m_j relevant frequencies $ω_{j} = (ω_{j, 1}, \dots, ω_{j, m_{j}})'$ , for $j = 1, 2, \dots, k + 1$ . Setting $s_{0} = 1$ and $s_{k + 1} = n$ , we can write the following sinusoidal model (Andrieu and Doucet Citation1999)(1) $y_{t} = \sum_{j = 1}^{k + 1} f (t, β_{j}, ω_{j}) 1_{[t \in I_{j}]} + ε_{t},$ (1) where(2) $\begin{matrix} f (t, β_{j}, ω_{j}) = α_{j} + μ_{j} t \\ + \sum_{l = 1}^{m_{j}} (β_{j, l}^{(1)} cos (2 π ω_{j, l} t) + β_{j, l}^{(2)} sin (2 π ω_{j, l} t)), \end{matrix}$ (2)

$β_{j} = (α_{j}, μ_{j}, β'_{j, 1}, \dots, β'_{j, m_{j}})'$ , $β_{j, l} = (β_{j, l}^{(1)}, β_{j, l}^{(2)})'$ , $1_{[\cdot]}$ denotes the indicator function, and μ_j and α_j may, if needed, account for a linear trend within each segment. For simplicity we assume independent zero-mean Gaussian errors with regime-specific variances(3) $ε_{t} \sim N (0, σ_{j}^{2}), for t \in I_{j} and j = 1, \dots, k + 1,$ (3) noting that the methodology can in principle be extended to the non-Gaussian case.

The dimension of the model is given by the number of change-points k and the number of frequency components in each regime denoted by $m_{(k)} = (m_{1}, \dots, m_{k + 1})'$ . Furthermore, let $β_{(k)} = (β'_{1}, \dots, β'_{k + 1})', ω_{(k)} = (ω'_{1}, \dots, ω'_{k + 1})'$ , $σ_{(k)}^{2} = (σ_{1}^{2}, \dots, σ_{k + 1}^{2})'$ , and $θ_{(k)} = (β'_{(k)}, ω'_{(k)}, σ_{(k)}^{2'})'$ . Using EquationEquation (1)(1) $y_{t} = \sum_{j = 1}^{k + 1} f (t, β_{j}, ω_{j}) 1_{[t \in I_{j}]} + ε_{t},$ (1) , the likelihood of $(k, m_{(k)}, s_{(k)}, θ_{(k)})$ given the data $y = (y_{1}, \dots, y_{n})'$ is(4) $\begin{matrix} L (k, m_{(k)}, s_{(k)}, θ_{(k)}, | y) = \prod_{j = 1}^{k + 1} L (m_{j}, θ_{j} | y_{j}), \\ y_{j} = (y_{t} : t \in I_{j}), \end{matrix}$ (4) where(5) $\begin{matrix} L (m_{j}, θ_{j} | y_{j}) = {(2 π σ_{j}^{2})}^{- n_{j} / 2} \\ \times exp [- \frac{1}{2 σ_{j}^{2}} \sum_{t \in I_{j}}^{} {y_{t} - x_{t} (ω_{j})' β_{j}}^{2}], \end{matrix}$ (5)

$θ_{j} = (β'_{j}, ω'_{j}, σ_{j}^{2'})'$ is the vector of parameters, n_j the number of observations of the jth segment, and the vector of basis functions $x_{t} (ω_{j})$ is defined as $\begin{matrix} x_{t} (ω_{j}) = (1, t, cos (2 π ω_{j, 1} t), sin (2 π ω_{j, 1} t), \dots, \\ cos (2 π ω_{j, m_{j}} t), sin (2 π ω_{j, m_{j}} t))' . \end{matrix}$

3 Bayesian Inference

Given some prefixed maximal numbers of change-points, $k_{max}$ , and frequencies per regime, $m_{max}$ , inference is achieved by assuming that the true model is unknown but comes from a finite class of models where each model $M_{k}$ , with k change-points, is parameterized by the vector $(m_{(k)}, s_{(k)}, θ_{(k)}) \in Π_{k}, Π_{k} \in Π .$

Let $S_{k} = {s_{(k)} \in {[1, n]}^{k} : 1_{[1 < s_{1} < \dots < s_{k} < n]}}$ and $Ω_{m_{j}} = {(0, 0.5)}^{m_{j}}$ denote, respectively, the sample space for the locations of change-points and the frequencies of the jth segment. The overall parameter space can be written as a finite union of subspaces $\begin{matrix} Π = \cup_{k = 0}^{k_{max}} {k} \times Π_{k}, and \\ Π_{k} = S_{k} \times \prod_{j = 1}^{k + 1} {m_{j}} \times \cup_{m_{j} = 1}^{m_{max}} {R^{2 m_{j} + 2} \times Ω_{m_{j}} \times R^{+}} . \end{matrix}$

Bayesian inference on $k, m_{(k)}, s_{(k)}$ and $θ_{(k)}$ is based on the following factorization of the joint posterior distribution $\begin{matrix} π (k, m_{(k)}, s_{(k)}, θ_{(k)} | y) = π (k | y) π (m_{(k)} | k, y) \\ π (s_{(k)} | m_{(k)}, k, y) π (θ_{(k)} | s_{(k)}, k, m_{(k)}, y), \end{matrix}$ where we use $π (\cdot)$ as generic notation for probability density or mass function, whichever is appropriate. Sampling from it poses a multiple model selection problem, namely of the number of change-points and number of frequencies in each regime, which can be addressed by constructing a reversible-jump MCMC algorithm Green (Citation1995). The algorithm in its basic structure iterates between the following two moves:

Segment model move: Given a partition of the data at k locations $s_{(k)}$ , inference on the parameters $m_{(k)}$ and $θ_{(k)}$ is based on the conditional posterior $π (m_{(k)}, θ_{(k)} | k, s_{(k)}, y) = \prod_{j = 1}^{k + 1} π (m_{j}, θ_{j} | k, s_{(k)}, y_{j}) .$ A reversible-jump MCMC algorithm is performed in parallel on each of the k + 1 segments, where at each iteration the number of sinusoids m_j, the linear coefficients $β_{j}$ , the frequencies $ω_{j}$ , and the residual variances $σ_{j}^{2}$ are sampled independently in each segment, for $j = 1, \dots, k + 1$ . Notice that at this stage the algorithm will explore subspaces of variable dimensionality regarding the number of frequencies per segment, while the change-point model remains fixed.
Change-point model move: This step performs a reversible-jump MCMC algorithm for change-point model search where the number k and locations of change-points $s_{(k)}$ are sampled, along with the linear coefficients, number of frequencies and their values as well as the residual variances for any segments affected by the move.

Our prior specifications assume independent Poisson distributions for the number of break-points k and frequencies in each segment m_j, conditioned on $k \leq k_{max}$ and $1 \leq m_{j} \leq m_{max}$ , respectively. Given k, a prior distribution for the positions of the change-points $s_{(k)}$ can be chosen as in Green (Citation1995)(6) $\begin{matrix} π (s_{(k)} | k) = \frac{(2 k + 1)!}{{(n - 1)}^{2 k + 1}} \prod_{j = 0}^{k} (s_{j + 1} - s_{j}) 1_{[s_{0} < s_{1} < \dots < s_{k} < n]}, \\ s_{0} = 1, s_{k + 1} = n . \end{matrix}$ (6)

Conditional on k and $m_{(k)}$ , we choose a uniform prior for the frequencies $ω_{j, l} \sim Uniform (0, 0.5), l = 1, \dots, m_{j}, and j = 1, \dots, k + 1.$ Analogous to a Bayesian regression (Bishop Citation2006), a zero-mean isotropic Gaussian prior is assumed for the coefficients of the jth segment, $β_{j} \sim N_{2 m_{j} + 2} (0, σ_{β}^{2} I), j = 1, \dots, k + 1,$ where $σ_{β}^{2}$ is a prespecified large value, and the prior on the residual variance $σ_{j}^{2}$ of the jth partition is $Inverse - Gamma (\frac{ν_{0}}{2}, \frac{γ_{0}}{2})$ , where η₀ and ν₀ are fixed at small values.

4 Sampling Scheme for Nonstationary Periodic Processes

Here we provide the sampling scheme associated with the nonstationary periodic processes that we wish to model. An outline of the overall procedure is as follows. Start with an initial configuration of number of change-points k, along with their locations $s_{(k)}$ ; this yields a partition of the data $y = (y_{1}, \dots, y_{k + 1})$ . Initialize the number of frequencies in each regime $m_{(k)}$ and their values $ω_{(k)}$ , along with the coefficients $β_{(k)}$ and residual variances $σ_{(k)}^{2} .$ At each iteration of the algorithm a segment model and a change-point model move are estimated. A random choice with probabilities (7) based on the current number of parameters will determine whether to attempt a birth, death or a within-model move. In particular, let z denote the current number of parameters, that is, change-points k in the change-point model or frequencies m_j in the jth segment model; then, the dimension may increase by one (birth step) with probability b_z, decrease by one (death step) with probability d_z or remain unchanged (within step) with probability $μ_{z} = 1 - b_{z} - d_{z}$ , where(7) $b_{z} = c min {1, \frac{π (z + 1)}{π (z)}}, d_{z + 1} = c min {1, \frac{π (z)}{π (z + 1)}},$ (7) for some constant $c \in [0, \frac{1}{2}]$ , and $π (z)$ is the prior probability of the model including z. Reversibility of the Markov chain is guaranteed for move types that involve a change in dimensionality as $b_{z} π (z) = d_{z + 1} π (z + 1) .$ Here we chose c = 0.4 but other values are legitimate as long as c is not larger than 0.5, to assure that the sum of the probabilities does not exceed 1 for some values of z. Naturally, $b_{k = k_{max}} = b_{m = m_{max}} = 0$ and $d_{k = 0} = d_{m = 1} = 0$ . The pseudocode of the overall algorithm that describes an iteration of the sampler is given in Algorithm 1. We next describe in more detail the specific procedures needed to update the moves.

Algorithm 1:

1. For each segment $j = 1, \dots, k + 1$ , perform a segment model move (Section 4.1)

Draw $U \sim Uniform (0, 1)$

if $U \leq b_{m_{j}} \to$ birth-step

else if $b_{m_{j}} \leq U \leq d_{m_{j}} \to$ death-step

else $\to$ within-step

2. Perform a change-point model move (Section 4.2):

Draw $U \sim Uniform (0, 1)$

if $U \leq b_{k} \to$ birth-step

else if $b_{k} \leq U \leq d_{k} \to$ death-step

else $\to$ within-step

4.1 Updating a Segment Model

Given the number of change-points k and their locations $s_{(k)}$ , a segment model move is performed independently and in parallel on each of the k + 1 partitions. Hence, throughout this subsection the subscript relating to the jth segment may be dropped and a segment of interest is denoted by $y = (y_{a}, \dots, y_{b})'$ , which contains n observations. Assume that the current number of frequencies is set at m; then, an independent random choice is made between attempting a birth, death or within-model step, with probabilities given in (7). An outline of these moves is as follows (further details are provided in the Appendix).

4.1.1 Within-Model Move

Conditioned on the number of frequencies m, we sample the vector of frequencies $ω$ following Andrieu and Doucet (Citation1999), that is, by sampling the frequencies one-at-time using a mixture of M-H steps, with target distribution(8) $\begin{matrix} π (ω | β, σ^{2}, m, y) \\ \propto exp [- \frac{1}{2 σ^{2}} \sum_{t = a}^{b} {y_{t} - x_{t} (ω)' β}^{2}] 1_{[ω \in Ω_{m}]} . \end{matrix}$ (8)

In particular, the proposal distribution is a combination of a normal random walk centered around the current frequency and a sample from values of the discrete Fourier transform of y. The corresponding vector of linear parameters $β$ is then updated in a M-H step, from the target posterior conditional distribution(9) $\begin{matrix} π (β | ω, σ^{2}, m, y) \\ \propto exp [- \frac{1}{2 σ^{2}} \sum_{t = a}^{b} {y_{t} - x_{t} (ω)' β}^{2} - \frac{1}{2 σ_{β}^{2}} β' β], \end{matrix}$ (9) where the proposed values are drawn from normal approximations to their posterior conditional distribution. Finally, the residual variance σ² is then updated in a Gibbs step from(10) $\begin{matrix} σ_{| ω, β}^{2} \sim Inverse - Gamma \\ (\frac{n + ν_{0}}{2}, \frac{γ_{0} + \sum_{t = a}^{b} {y_{t} - x_{t} (ω)' β}^{2}}{2}) . \end{matrix}$ (10)

4.1.2 Between-Model Moves

For this type of move, the number of frequencies is either proposed to increase by one (birth) or decrease by one (death). If a birth move is proposed, we have that $m^{p} = m^{c} + 1$ , where current and proposed values are denoted by the superscripts c and p, respectively. The proposed vector of frequencies is constructed by proposing an additional frequency to include in the current vector. Conditional on the frequencies, the corresponding vector of linear coefficients and the residual variance are sampled as in the within-model move. If a death move is proposed, we have that $m^{p} = m^{c} - 1$ . Hence, one of the current frequencies is randomly chosen to be removed. The proposed corresponding vector of linear coefficients is drawn, along with the residual variance. For both moves, the updates are jointly accepted or rejected in a M-H step.

4.2 Updating the Change-Point Model

This part of the algorithm identifies the number and locations of change-points. Suppose the number of change-points is currently set to some value k, then according to the probabilities given in (7) a random decision is made between adding, removing, or moving a change-point. The rules for updating these types of moves are described below and more details are given in the Appendix.

4.2.1 Within-Model Move

An existing change-point is proposed to be relocated with probability $\frac{1}{k}$ , obtaining say $s_{j}^{c}$ . The update for the selected change-point is proposed from a mixture of a normal random walk centered on the current change-point $s_{j}^{c}$ and a sample from a uniform distribution on the interval $[s_{j - 1}^{c} + ψ_{s}, s_{j + 1}^{c} - ψ_{s}]$ . Here, we introduced ψ_s as a fixed minimum time between change-points avoiding change-points being too close to each other. Rosen, Wood, and Stoffer (Citation2012) used a similar scheme, but on a discrete-scale. The number of frequencies and their values are kept fixed, and, conditional on the relocation, the linear coefficients for the segments affected by the relocation are sampled. These updates are jointly accepted or rejected in a M-H step and the residual variances are updated in a Gibbs step.

4.2.2 Between-Model Moves

For this type of move, the number of change-points may either increase (birth) or decrease (death) by one. If a birth move is proposed, we have that $k^{p} = k^{c} + 1$ . The new proposed change-point is drawn uniformly on $f (s_{(k^{c})}^{c}, ψ_{s})$ , the support of $s_{(k^{c})}^{c}$ given the constraints imposed by ψ_s, that is, $f (s_{(k^{c})}^{c}, ψ_{s}) = [1 + ψ_{s}, s_{1}^{c} - ψ_{s}] \cup [s_{1}^{c} + ψ_{s}, s_{2}^{c} - ψ_{s}] \cup \dots \cup [s_{k^{c}}^{c} + ψ_{s}, n - ψ_{s}]$ . The latter involves splitting an existing segment. The number of frequencies and their values in the proposed segments are selected from the current states. Two residual variances for the new proposed segments are then constructed from the current single residual variance. Finally, two new vectors of linear parameters are sampled. If a death move is proposed, we have that $k^{p} = k^{c} - 1$ . Hence, a candidate change-point to be removed is selected from the vector of existing change-points, with probability $\frac{1}{k^{c}}$ . The latter involves merging two existing partitions. The number of frequencies and their values in the proposed segments are selected from the current states. A single residual variance is constructed from the current variances relative to the segments affected by the relocation. Finally, a new vector of linear coefficient is drawn. For both type of moves, these updates are jointly accepted or rejected.

5 Simulation Studies

We carry out simulation studies to explore the performance of our method, which will be referred to as Automatic Nonstationary Oscillatory Modeling (AutoNOM). In Section 5.1, we illustrate the performance of our methodology when the simulated data are generated from the proposed model. Section 5.2 deals with scenarios when the model is misspecified relative to the generating process. Our results are compared with two state-of-the-art existing methods.

5.1 Illustrative Example

In this simulation example, we generate a time series consisting of n = 900 data points from model (1) with k = 2 change-points located at positions $s_{(2)} = (300, 650)$ , and fixed number of frequencies per regime $m_{(2)} = (3, 1, 2)$ . (Further details of the parameterization are available in supplementary materials, Section 1.1.) (top panel) shows a realization from this model. The prior means $λ_{ω}$ and λ_s, say, on the number of frequencies and change-points, respectively, were set to 2, to reflect a fair degree of prior information on their numbers. We discuss in Section 1.2 of the supplementary materials that AutoNOM was relatively insensitive to these prior specifications, for this example. The maximum number of change-points $k_{max}$ was set to 15, and the maximum number of frequencies per regime $m_{max}$ was set to 10. Furthermore, we fixed $ψ_{s} = 20$ and $ϕ_{ω}$ = 0.25 (Appendix A.1.2) for the uniform distribution for sampling the frequencies. The full estimation algorithm was ran for 20,000 updates, 5000 of which are discarded as burn-in period. The estimation took 390 sec with a (serial) program written in Julia 0.62 on a Intel® Core™ i7-4790S Processor 16 GB RAM. The results, summarized in clearly show that a model with two change-points has the highest estimated posterior probability (left panel) and that AutoNOM correctly identifies the right number of significant frequencies in each regime (right panel).

Fig. 1 Illustrative example. (Top) Simulated time series. (Middle) Estimated posterior distribution for the location of the change-points, conditioned on k = 2. The dotted vertical lines represent true location of change-points. (Bottom) Estimated posterior distribution of the frequencies for each different segment, conditioned on k = 2, $m_{1} = 3, m_{2} = 1$ , and $m_{3} = 2$ . The dotted vertical lines represent true values of the frequencies.

Table 1 Illustrative example.

Display Table

(middle panel) shows the estimated posterior distribution for the location of the change-points, conditioned on three segments. The posterior means of the change-point locations are $\hat{E} (s_{1} | k = 2, y) = 298.7$ and $\hat{E} (s_{2} | k = 2, y) = 650.1$ . (bottom panel) shows that the estimated posterior distributions are an excellent match to the true frequencies. In addition, we provide details about acceptance rates in supplementary materials, Section 2.

5.1.1 Detecting Spectral Peaks

We simulate a time series from the same simulation model as above with the only difference that the residual variances were set equal to one for all segments and thus are smaller than above. The performance of AutoNOM is compared with two existing methods, namely the Bayesian adaptive spectral estimation for nonstationary time series proposed by Rosen, Wood, and Stoffer (Citation2012), referred to as AdaptSPEC, and the frequentist piecewise vector autoregressive method of Davis, Lee, and Rodriguez-Yam (Citation2006), referred to as AutoPARM. Specifically, we explore the performances of these methodologies in identifying the number and location of change-points, and the number and location of frequency peaks in each estimated segment. AdaptSPEC requires the user to specify in advance the number of basis function J used for smoothing the periodogram in the segments. We run AdaptSPEC for two different specifications, namely J = 7 and J = 15 basis functions. The model is fitted with a total of 15,000 iterations, 5,000 of which are discarded as burn-in, by using the R package provided by the authors. Posterior samples of peak frequencies are obtained by considering the modes of the spectrum per MCMC iteration. AutoPARM is performed with default tuning parameters. We note that Davis, Lee, and Rodriguez-Yam (Citation2006) do not discuss computation of confidence intervals for frequencies.

The modal number of change-points for AdaptSPEC is 2 for both J = 7 and J = 15, with posterior probability $\hat{π} (k = 2 | y)$ of 76% and 88%, respectively; the modal number of change-points for AutoNOM is 2 and AutoPARM identifies 2 change-points as well. Conditioned on the modal number of change-points, displays the estimated location of changes (left panel) and frequency peaks (right panel) for the different compared methods, where we report the standard deviation for the estimate obtained from the empirical distribution of the posterior samples. Similarly, we show in the estimated location of the frequency peaks and their 95% credible intervals, for each of the three identified segments; dotted vertical lines represents the true location of the frequency peaks. Results for AutoNOM are conditioned on the modal number of frequencies per regime.

Fig. 2 Illustrative example with unitary residual variances. Estimated frequency peaks for AutoNOM (AN), AdaptSPEC (AS, $J = 7, 15$ ), and AutoPARM (AP); 95% credible intervals (horizontal lines) are also reported for Bayesian methods. Dotted vertical lines are true locations of the frequency peaks.

Table 2 Illustrative example with unitary residual variances.

Display Table

It becomes clear that the detection of periodicities by AdaptSPEC is affected by the specification of the number of spline basis functions used for the smoothing, where increasing the number of basis function yields a better performance for AdaptSPEC. The example also shows that smoothing by splines may lead to peaks in the periodogram to be over-smoothed and neighboring close peaks to be merged. AutoPARM seems to also suffer from the latter problem.

When we increased the residual variance to the high levels set originally, AdaptSPEC failed to detect any change-points for both J = 7 and J = 15, with posterior probability $\hat{π} (k = 0 | y)$ of 69% and 93%, respectively, while AutoPARM found 7 change-points and thus severely overestimates their number. Our conclusion from this comparison is that although AdaptSPEC and AutoPARM may be well suited for time series processes with smooth time-varying spectra with few or no peaks, both methods are severely challenged in detecting changes in spectra that exhibit pronounced peakedness, possibly at nearby frequencies, as can be expected to occur in reality for the type of time series that we wish to analyze.

5.2 Misspecified Model

We investigate the performance of our proposed method for identifying spectral peaks when the model is misspecified relative to the generating process. In particular, we explored simulation studies under three different settings. In the first two scenarios we generated data from two types of autoregressive (AR) processes, namely a piecewise AR process and a slowly varying AR process. We compare the performance of our procedure with AutoPARM and AdaptSPEC. In the third setting, we assumed that the innovations are t-distributed, and therefore violate the Gaussianity assumption of $ε_{t}$ in EquationEquation (3)(3) $ε_{t} \sim N (0, σ_{j}^{2}), for t \in I_{j} and j = 1, \dots, k + 1,$ (3) . For all models, our estimation algorithm was run for 20,000 iterations, 5000 of which were used as burn in, and the hyperparameters were chosen as $ϕ_{s} = 40, λ_{ω} = 0.05$ and $λ_{s} = 0.01$ .

5.2.1 Piecewise Autoregressive Process

As pointed out by a referee, although modeling a time series as a linear combination of a finite number of sinusoids plus noise is common in the signal processing literature, such line-spectrum based models are rare in the statistics literature. In fact, it is commonly assumed that the power spectrum is continuous across frequencies. We investigate the performance of the proposed procedure when analyzing data generated from a piecewise AR process whose local spectral density functions show sharp peaks. Specifically, a realization is simulated from(11) $y_{t} = {\begin{matrix} 1.9 y_{t - 1} - 0.975 y_{t - 2} + ε_{t}^{(1)} & for 1 \leq t \leq 250 \\ 1.9 y_{t - 1} - 0.991 y_{t - 2} + ε_{t}^{(2)} & for 251 \leq t \leq 400 \\ - 1.35 y_{t - 1} - 0.37 y_{t - 2} + 0.36 y_{t - 3} + ε_{t}^{(3)} & for 401 \leq t \leq 550, \end{matrix}$ (11) where $ε_{t}^{(1)} \overset{iid}{\sim} N (0, 0.25)$ and $ε_{t}^{(i)} \overset{iid}{\sim} N (0, 1)$ for i = 2, 3. (top panel) shows a realization from model (11). After applying our methodology AutoNOM, the posterior probability of two change-points is 97.93% and the posterior means of the change-point locations are $\hat{E} (s_{1} | k = 2, y) = 251.19$ and $\hat{E} (s_{2} | k = 2, y) = 401.56$ . The estimated location of the frequency peaks for our proposed procedure in comparison to AdaptSPEC and AutoPARM and the true values are shown in (bottom panels). It is evident that the proposed and existing methodologies successfully identify the true location of the frequency peaks in each segment, with AdaptSPEC showing less precision.

Fig. 3 Piecewise AR process. (Top) A realization from model (11). Vertical dotted lines are the estimated locations of the change-points. (Bottom) Estimated frequency peaks for AutoNOM (AN), AdaptSPEC (AS J = 10), and AutoPARM (AP); 95% credible intervals (horizontal lines) are also reported for Bayesian methods. Dotted vertical lines are true locations of the frequency peaks.

5.2.2 Slowly Varying Autoregressive Process

In this section, we analyze an AR process whose continuous spectral density is changing slowly over time. We note though that this scenario is a large departure from the assumptions of our model. In particular, we consider the same slowly varying AR(2) process investigated by Ombao et al. (Citation2001) and Davis, Lee, and Rodriguez-Yam (Citation2006), namely(12) $y_{t} = a_{t} y_{t - 1} - 0.81 y_{t - 2} + ε_{t}, t = 1, \dots, 1031,$ (12) where $a_{t} = 0.8 [1 - 0.5 cos (π t / 1031)]$ and $ε_{t} \overset{iid}{\sim} N (0, 1)$ . Notice that the parameter a_t is changing gradually over time whereas the coefficient associated with the second lag remains constant. A realization from model (12) is shown in and the corresponding time varying frequency peak is displayed in as a solid line. also shows the estimated time varying frequency peak for AutoNOM, AdaptSPEC, and AutoPARM. For AutoNOM and AdaptSpec, the time changing frequency peak has been averaged across the MCMC samples, giving a smoother estimate (especially for AdaptSPEC) than the one obtained by AutoPARM. For each method, we compute the residual sum of squares $RSS = \sum_{t = 1}^{1031} {(ω_{t} - {\hat{ω}}_{t})}^{2}$ between the true time changing frequency peak ω_t and its estimate ${\hat{ω}}_{t}$ . The RSS in this example was 0.111, 0.174, and 0.085 for AutoNOM, AdaptSPEC, and AutoPARM, respectively. It is clear that even in this scenario where the data generating model was very different from the underlying assumptions of our model, our approach seems to outperform AdaptSPEC and remains competitive with AutoPARM in estimating the time varying frequency peak.

Fig. 4 Slowly varying AR(2) process. (a) A realization from model (12). (b) True time varying frequency peak (solid line) and estimated time varying frequency peak for AutoNOM (AN), AdaptSPEC (AS J = 10), and AutoPARM (AP).

5.2.3 Non-Gaussian Time Series

We investigate the performance of our approach in the scenario when the innovations are t-distributed. We simulate a time series from the same simulation model presented in Section 5.1, where errors were generated from a t-distribution with 2, 3, and 2 degrees of freedom for the sequence of three segments, respectively. The degrees of freedom were chosen low such that the corresponding distributions show heavy tails. A realization of this time series is shown in . Our proposed methodology correctly identifies the 2 change-points, as the estimated posterior probability $\hat{π} (k = 2 | y)$ is 0.99. The posterior means of the change-point locations are $\hat{E} (s_{1} | k = 2, y) = 303.6$ and $\hat{E} (s_{2} | k = 2, y) = 650.5$ , showing an excellent match to the true values $s_{(2)} = (300, 650)$ . Furthermore, the posterior mode of the number of frequencies in each segment is ${\hat{m}}_{(2)} = (3, 1, 2)$ , which is a correct estimate of $m_{(2)} = (3, 1, 2)$ . We also display in the estimated signal (using EquationEquation (1)(1) $y_{t} = \sum_{j = 1}^{k + 1} f (t, β_{j}, ω_{j}) 1_{[t \in I_{j}]} + ε_{t},$ (1) , supplementary materials) as a dotted line. We can conclude that, although our model assumes Gaussianity, AutoNOM seems to perform well even in the case where the oscillatory underlying process is t-distributed with heavy tails.

Fig. 5 Illustrative example with t-distributed residual variances. Simulated time series (solid line) and estimated signal (dotted line). The dotted vertical lines represent the estimated location of the change-points.

6 Case Studies

The development of our methodology was motivated by the following two case studies where dense physiological signals were observed which exhibit unknown periodicities whose role may change over time in a more or less abrupt manner and where their detection is of relevance to the health and well-being of the subject.

6.1 Analysis of Human Skin Temperature

The development of information and communication technologies, in particular widespread internet access and availability of mobile phones and tablets, allows considering new developments in the health care system. To address the issue of personalized medical treatment according to the circadian timing system of the patient, referred to as chronotherapy (Levi and Schibler Citation2007), a novel and validated noninvasive mobile e-Health platform pioneered by the French project PiCADo (Komarzynski et al. Citation2018) is used to record and teletransmit skin surface temperature as well as physical activity data (Huang et al. Citation2018) from an upper chest e-sensor. shows an example of 4 days of 5-min temperature recording for a healthy individual. The circadian rhythms in core and skin surface temperature are usually 8–12 hr out of phase, with respective maxima occurring near 16:00 at day time, and near 2:00 at night (Krauchi and Wirz-Justice Citation1994). The early night drop in core body temperature, which is critical for triggering the onset of sleep (Van Someren Citation2006), results from the vasodilatation of the skin vessels and associated rise in skin surface temperature (Kräuchi Citation2002). Under the assumption of stationarity, Komarzynski et al. (Citation2018) analyzed the skin temperature time series identifying both strong 12 hr (circahemidian) and 24 hr (circadian) rhythms.

Fig. 6 Analysis of skin temperature of a healthy subject. Panel (a) are the time series of skin temperature and corresponding physical activity. Panel (b) is the estimated signal (solid line) along with its 95% credible interval; vertical lines are the estimated locations of the change-points. Panel (c) is the estimated posterior density histogram of the locations of the changes, conditioned on $k = 7$ change-points. Rectangles on the time axis of each plot correspond to periods from 20.00 to 8.00. The variation in skin temperature finds analogies with the rest-activity pattern that alternates between day activity and night rest.

Here, we applied our methodology to the skin-temperature time series shown in for 300,000 iterations, discarding the first 100,000 updates as burn-in. The maximum number of change-points $k_{max}$ was set to 10, whereas the maximum number of frequencies per regime $m_{max}$ was set to 5. The estimated number of change-points had a mode at 7, with $\hat{π} (k = 7 | y) = 0.97$ and their estimated posterior distributions are shown in . Inspecting them alongside the physical activity data we can see that the change points mainly correspond to the start and endpoints of the prolonged rest periods at nights showing that skin temperature alternates between day activity and night rest including sleep. shows the estimated posterior distribution of the frequencies for the sleep segments (2, 4, 6, 8) along with the square root of the estimated power of the corresponding frequencies, where the power of each is frequency $ω_{j, l}$ is summarized by the sum of squares of the corresponding linear coefficients, that is, $I (ω_{j, l}) = β_{j, l}^{{(1)}^{2}} + β_{j, l}^{{(2)}^{2}}$ (Shumway and Stoffer Citation2005). shows the piecewise fitted signal, along with a 95% credible interval obtained from the 2.5 and 97.5 empirical percentiles of the posterior sample using EquationEquation (1)(1) $y_{t} = \sum_{j = 1}^{k + 1} f (t, β_{j}, ω_{j}) 1_{[t \in I_{j}]} + ε_{t},$ (1) , supplementary materials. Cycles of approximately 3 hr appear in segments 2, 4, and 6; cycles that range approximately 1–1.5 hr appear in segments 2, 4, 8 and cycles of around 2 hr appear in segments 4 and 6 while some longer periods identified in segments 4, 6, 8 indicate the presence of a trend.

Fig. 7 Spectral properties for segments corresponding to night rest. Estimated posterior distribution of the frequencies along with square root of the estimated power of the corresponding frequencies. The results are conditional on the modal number of frequencies per segment.

Stages of sleep are characterized by ultradian oscillations between rapid eye movements (REM) and non-REM. The biological functionality that regulates the alternations between these two types of sleep is not yet much understood (Altevogt and Colten Citation2006). However, several physiological changes that occur over night differ between REM and non-REM phases, such as heart rate, brain activity, muscle tone and body temperature (Berlad et al. Citation1993; Pace-Schott and Hobson Citation2002). The body cycles between REM and non-REM sleep stages with an average length that ranges approximately between 70 and 120 min, and there are usually four to six of these sleep cycles each night (Carskadon and Dement Citation2005; Shneerson Citation2009). Our analysis was able to use skin temperature data alone to detect periods of sleep throughout the day and identify oscillatory behavior during the night, whose frequencies are compatible with ultradian oscillations between REM and different non-REM sleep stages.

A comparison with the current state-of-the-art methods, AutoPARM and AdaptSPEC, is provided in the supplementary materials, Section 4.1. Circadian and ultradian rhythmicity are expected because body temperature is known to be a circadian biomarker (Krauchi and Wirz-Justice Citation1994), but these existing methods fail. Furthermore, we notice that in the framework of analyzing circadian biomarker data, such as body temperature, a change in acrophase may be of interest to the clinician as this may be indicative of a disruption of the bodyclock. The methodology can indeed be used to investigate phase which can be computed from the sinusoidal function that characterizes the jth segment (see supplementary materials, Section 3).

6.2 Characterizing Instances of Sleep Apnea in Rodents

Sleep apnea is the temporary ( $\geq$ 2 breaths) interruption of breathing during sleep. Moderate or severe ( $\geq$ 15 events per hour) sleep apnea, occurs in about 50% of men and 25% of women over the age of 40 (Heinzer et al. Citation2015), with 91% of people with sleep apnea being undiagnosed (Tan et al. Citation2016). Sleep apnea is linked to many diseases. Patients with sleep apnea are at increased risk of: cardiovascular events (Lanfranchi et al. Citation1999), cancer (Nieto et al. Citation2012), liver disease (Sundaram et al. Citation2016), diabetes (Harsch et al. Citation2004), metabolic syndrome (Parish, Adam, and Facchiano Citation2007), cognitive decline (Osorio et al. Citation2016), and increased risk of dementia in the elderly (Lal, Strange, and Bachman Citation2012). The motivation of this research is to provide a statistical methodology that can be applied to analyze large breathing datasets resulting from in vivo plethysmograph studies in rats to characterize the occurrence of sleep apnea under different experimental conditions. If this could be attained, a concrete aid to the understanding of the pathological implications of this status could be provided to clinicians and experimental biologists.

An unrestrained whole-body plethysmograph is used to produce a breathing trace from freely behaving rats for periods of up to 3 hr. Plethysmographs were made using an 2 L air-tight box connected to a pressure transducer, with an air pump and outlet valve producing a flow rate of 2 L/min. Airflow pressure signals were amplified using Neurolog system (Digitimer) connected to a 1401 interface and acquired on a computer using Spike2 software (CED).

Apneas are subclassified as post-sigh apneas, if the preceding breath was at least 25% above the average amplitude of prior breaths, or spontaneous apneas, if there was no manifestation of a previous sigh (Davis and O’Donnell Citation2013). Airflow traces from the plethysmograph are shown in (left panels) and consist of three time series, which will be referred to as (a), (b), and (c). They correspond to different actions for this rat: (a) an alternation of sniffing and normal breathing; (b) spontaneous apnea followed by normal breathing; (c) normal breathing followed by a sigh, and a post-sigh apnea. We note that these actions were classified by eye by an experienced experimental researcher. Each time series contains 20,000 observations where the signal was sampled at 2000 Hz so that we have 2000 observations per second.

Fig. 8 Plots of the respiratory traces of a rat (left panels) and corresponding estimated posterior power (right panels). Panel (a) is characterized by an alternation of sniffing and normal breathing. Panel (b) is a plot of the trace of a spontaneous apnea, followed by normal breathing. Panel (c) shows normal breathing followed by a sigh, and a post-sigh apnea. Dotted vertical lines correspond to the estimated locations of the change-points.

Our procedure allows us to set an upper bound, $ϕ_{ω}$ , (Appendix A.1.2) for the uniform interval where the new frequencies are sampled. As the periodogram ordinates for these data were approximately zero for all frequencies larger than 0.01, we decided accordingly to set $ϕ_{ω} = 0.01$ . The locations of the changes (vertical lines) are displayed in (left panels). The posterior power of the frequencies, for each time series, is shown in (right panels). These results are conditional on the modal number of change-points and the modal number of frequencies per segment. For each dataset, we summarize in the spectral properties of each partition by displaying the periodicities corresponding to the first two largest values of the estimated power. When the rat is sniffing, (a), the air flow trace oscillates with a dominant period of approximately 0.2 sec, namely 5 cycles per second. Normal breathing, (a) and (b), is characterized by lower frequencies and lower magnitude than sniffing, by oscillating with a dominant period of around 0.5 sec, namely around 2 cycles every second. Apneas, (b) and (c), appear to be characterized by higher frequencies than normal breathing but with a lower power, with dominant periods of around 0.25 and 0.35 sec. Notice that in the first partition of (c), the highest value of the power corresponds to the frequency responsible for a sigh before apnea. Moreover, our methodology identifies different frequencies that explain the variation between the third and fourth partition of (c), leading to the hypothesis that there might be a time changing spectrum during the occurrence of an apnea instance. A comparison of our results with the results from AutoPARM and AdaptSPEC is provided in the supplementary materials, Section 4.2.

Table 3 Spectral properties of respiratory traces of a rat.

Download CSV Display Table

7 Summary and Discussion

We developed a novel Bayesian methodology for analyzing nonstationary time series that exhibit oscillatory behavior. Our approach is based on the assumption that, conditional on the position and number of change-points, the time series can be approximated by a piecewise changing sinusoidal regression model. The timing and number of changes are unknown, along with the number and values of relevant periodicities in each regime. Bayesian inference is performed via a reversible jump MCMC algorithm that can simultaneously estimate both the number and location of change-points, as well as the number, frequency and magnitude of sinusoids within each segment. Our methodology can be seen as a novel and relevant extension of the work in Andrieu and Doucet (Citation1999) to the nonstationary setting.

We illustrated the utility of our methodology in two case studies. First, we analyzed human skin temperature time series data obtained from a wearable device, which exhibited unknown periodicities that changed over time in an abrupt manner. Our proposed methodology identified interesting oscillations whose frequencies are consistent with ultradian oscillations between REM and non-REM sleep stages. Second, we characterized the occurrence of sleep apnea in large breathing datasets resulting from in vivo plethysmograph studies on rodents. Our spectral investigation was able to distinguish very sharp peaks, corresponding to different nearby frequencies, that are responsible for the different actions of the rodent.

Although we have not discussed this in detail here, several diagnostics for monitoring convergence were carried out in both simulation and case studies. In particular, we verified that the target posterior distribution reached a stable regime by analyzing the trace plot of the log-likelihood across MCMC iterations (Marin and Robert Citation2007). We are aware that assessing convergence only based on this simple tool may sometimes be misleading since stable values of the log-likelihood could simply mean that the Markov chain is stuck in some local mode of the posterior distribution. Additionally, conditioned on the modal number of change-points and modal number of frequencies per regime, we have also monitored (within-model) convergence by analyzing the traces and running averages plots for all parameters across MCMC iterations, with satisfactory results. Comparable results were also obtained when running several chains starting at over-dispersed initial values. We notice that the diagnostic tool used by Bruce et al. (Citation2018) and Li and Krafty (Citation2019) to assess convergence for reversible jump MCMC samplers appears relevant. In the context of adaptive spectral analysis of nonstationary time series, they point out that although the number of partitions change across models, a power spectrum is defined at each time point. The power spectrum is modeled with a fixed number of splines, yielding a vector of summary measures of parameters that maintain the same interpretation across models in their samplers. However, our proposed sampler has a further layer of variable dimensionality, as not only the number and locations of the change-points may change from one iteration to the next, but also the number of frequencies in each segment are not fixed throughout the simulation.

We conclude this article by noticing that, although a Gaussian distribution is assumed, it is conceivable that our model can be extended to allow for other error distributions in EquationEquation (1)(1) $y_{t} = \sum_{j = 1}^{k + 1} f (t, β_{j}, ω_{j}) 1_{[t \in I_{j}]} + ε_{t},$ (1) . For example, a generalized linear model (McCullagh and Nelder Citation1989) may be used to model periodic count data by assuming that the observed data follows a Poisson distribution, that is, $y_{t} \sim Poisson (μ_{t})$ . The logarithmic link function of the expected value μ_t of the response variable y_t may be expressed as $log (μ_{t}) = \sum_{j = 1}^{k + 1} f (t, β_{j}, ω_{j}) 1_{[t \in I_{j}]}$ , where the definitions of the variables are the same as for EquationEquation (1)(1) $y_{t} = \sum_{j = 1}^{k + 1} f (t, β_{j}, ω_{j}) 1_{[t \in I_{j}]} + ε_{t},$ (1) . Bayesian inference can, in principle, be achieved in a similar way as described in the article, namely by iterating between segment and change-point model moves, where the formulation of the acceptance probabilities and some proposal distributions need to be modified accordingly. We believe that such an extension would find use in several ranges of applications, for example, in studying population cycles in ecology and epidemiology, where the abundance of species are measured as count variables (White and Bennetts Citation1996; Bhaskaran et al. Citation2013; Bramness et al. Citation2015).

Supplementary Materials

Supplementary materials are available and include further details about simulation studies, acceptance rates, phase investigation, and performances of existing methods in the case studies. Code that implements the methodology and the data used in the case studies are available as online supplemental material and can be also found at https://github.com/Beniamino92/AutoNOM.

Supplemental material

Supplemental Material

Download Zip (567.8 KB)

Acknowledgments

We wish to thank the referees, the associate editor, Dr. Paul Jenkins, Zeda Li, and Jack Jewson for their insightful and valuable comments. The work presented in this article was developed as part of the first author’s PhD thesis at the University of Warwick.

Additional information

Funding

B. Hadj-Amar was supported by the Oxford-Warwick Statistics Programme (OxWaSP) and the Engineering and Physical Sciences Research Council (EPSRC) under grant number EP/L016710/1. B. Finkenstädt and F. Lévi were supported by the Medical Research Council (MRC), grant reference: MR/M013170/1. F. Lévi was partly supported by the Conseil Régional d’Île de France, the Conseil Régional de Champagne-Ardenne, Mairie de Paris and the Banque Publique d’Investissement (BPI France) through the Fonds Unique Interministériel 12 (PiCADo, contract 11017951), and the Institut de Recherches en Santé Publique from France (CLOCK-DOM1, grant 2014-BDCR-EC). R. Huckstepp was supported by the MRC, grant reference: MC/PC/15070.

References

Akaike, H. (1974), “A New Look at the Statistical Model Identification,” IEEE Transactions on Automatic Control, 19, 716–723. DOI: 10.1109/TAC.1974.1100705.
Web of Science ®Google Scholar
Altevogt, B. M., and Colten, H. R. (2006), Sleep Disorders and Sleep Deprivation: An Unmet Public Health Problem, Washington, DC: National Academies Press.
Google Scholar
Andrieu, C., and Doucet, A. (1999), “Joint Bayesian Model Selection and Estimation of Noisy Sinusoids via Reversible Jump MCMC,” IEEE Transactions on Signal Processing, 47, 2667–2676. DOI: 10.1109/78.790649.
Web of Science ®Google Scholar
Berlad, I., Shlitner, A., Ben-Haim, S., and Lavie, P. (1993), “Power Spectrum Analysis and Heart Rate Variability in Stage 4 and REM Sleep: Evidence for State-Specific Changes in Autonomic Dominance,” Journal of Sleep Research, 2, 88–90. DOI: 10.1111/j.1365-2869.1993.tb00067.x.
PubMed Web of Science ®Google Scholar
Bhaskaran, K., Gasparrini, A., Hajat, S., Smeeth, L., and Armstrong, B. (2013), “Time Series Regression Studies in Environmental Epidemiology,” International Journal of Epidemiology, 42, 1187–1195. DOI: 10.1093/ije/dyt092.
PubMed Web of Science ®Google Scholar
Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Information Science and Statistics, New York: Springer-Verlag.
Google Scholar
Bramness, J. G., Walby, F. A., Morken, G., and Røislien, J. (2015), “Analyzing Seasonal Variations in Suicide With Fourier Poisson Time-Series Regression: A Registry-Based Study From Norway, 1969–2007,” American Journal of Epidemiology, 182, 244–254. DOI: 10.1093/aje/kwv064.
PubMed Web of Science ®Google Scholar
Bretthorst, G. L. (1988), Bayesian Spectrum Analysis and Parameter Estimation (Vol. 48), New York: Springer-Verlag.
Google Scholar
Bretthorst, G. L. (1990), “Bayesian Analysis. I. Parameter Estimation Using Quadrature NMR Models,” Journal of Magnetic Resonance, 88, 533–551. DOI: 10.1016/0022-2364(90)90287-J.
Web of Science ®Google Scholar
Bruce, S. A., Hall, M. H., Buysse, D. J., and Krafty, R. T. (2018), “Conditional Adaptive Bayesian Spectral Analysis of Nonstationary Biomedical Time Series,” Biometrics, 74, 260–269. DOI: 10.1111/biom.12719.
PubMed Web of Science ®Google Scholar
Carskadon, M. A., and Dement, W. C. (2005), “Normal Human Sleep: An Overview,” Principles and Practice of Sleep Medicine, 4, 13–23.
Google Scholar
Dahlhaus, R. (1997), “Fitting Time Series Models to Nonstationary Processes,” The Annals of Statistics, 25, 1–37. DOI: 10.1214/aos/1034276620.
Web of Science ®Google Scholar
Davis, E. M., and O’Donnell, C. P. (2013), “Rodent Models of Sleep Apnea,” Respiratory Physiology & Neurobiology, 188, 355–361. DOI: 10.1016/j.resp.2013.05.022.
PubMed Web of Science ®Google Scholar
Davis, R. A., Lee, T. C. M., and Rodriguez-Yam, G. A. (2006), “Structural Break Estimation for Nonstationary Time Series Models,” Journal of the American Statistical Association, 101, 223–239. DOI: 10.1198/016214505000000745.
Web of Science ®Google Scholar
Djuric, P. M. (1996), “A Model Selection Rule for Sinusoids in White Gaussian Noise,” IEEE Transactions on Signal Processing, 44, 1744–1751. DOI: 10.1109/78.510621.
Web of Science ®Google Scholar
Dou, L., and Hodgson, R. (1995), “Bayesian Inference and Gibbs Sampling in Spectral Analysis and Parameter Estimation. I,” Inverse Problems, 11, 1069. DOI: 10.1088/0266-5611/11/5/011.
Web of Science ®Google Scholar
Dou, L., and Hodgson, R. (1996), “Bayesian Inference and Gibbs Sampling in Spectral Analysis and Parameter Estimation: II,” Inverse Problems, 12, 121. DOI: 10.1088/0266-5611/12/2/002.
Web of Science ®Google Scholar
Gilks, W. R., Richardson, S., and Spiegelhalter, D. (1995), Markov Chain Monte Carlo in Practice, Boca Raton, FL: CRC Press.
Google Scholar
Green, P. J. (1995), “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination,” Biometrika, 82, 711–732. DOI: 10.1093/biomet/82.4.711.
Web of Science ®Google Scholar
Han, F., Subramanian, S., Price, E. R., Nadeau, J., and Strohl, K. P. (2002), “Periodic Breathing in the Mouse,” Journal of Applied Physiology, 92, 1133–1140. DOI: 10.1152/japplphysiol.00785.2001.
PubMed Web of Science ®Google Scholar
Harsch, I. A., Schahin, S. P., Brückner, K., Radespiel-Tröger, M., Fuchs, F. S., Hahn, E. G., Konturek, P. C., Lohmann, T., and Ficker, J. H. (2004), “The Effect of Continuous Positive Airway Pressure Treatment on Insulin Sensitivity in Patients With Obstructive Sleep Apnoea Syndrome and Type 2 Diabetes,” Respiration, 71, 252–259. DOI: 10.1159/000077423.
PubMed Web of Science ®Google Scholar
Heinzer, R., Vat, S., Marques-Vidal, P., Marti-Soler, H., Andries, D., Tobback, N., Mooser, V., Preisig, M., Malhotra, A., Waeber, G., and Vollenweider, P. (2015), “Prevalence of Sleep-Disordered Breathing in the General Population: The Hypnolaus Study,” The Lancet Respiratory Medicine, 3, 310–318. DOI: 10.1016/S2213-2600(15)00043-0.
PubMed Web of Science ®Google Scholar
Huang, Q., Cohen, D., Komarzynski, S., Li, X.-M., Innominato, P., Lévi, F., and Finkenstädt, B. (2018), “Hidden Markov Models for Monitoring Circadian Rhythmicity in Telemetric Activity Data,” Journal of the Royal Society Interface, 15, 20170885. DOI: 10.1098/rsif.2017.0885.
PubMed Web of Science ®Google Scholar
Komarzynski, S., Huang, Q., Innominato, P. F., Maurice, M., Arbaud, A., Beau, J., Bouchahda, M., Ulusakarya, A., Beaumatin, N., Breda, G., and Finkenstädt, B. (2018), “Relevance of a Mobile Internet Platform for Capturing Inter- and Intrasubject Variabilities in Circadian Coordination During Daily Routine: Pilot Study,” Journal of Medical Internet Research, 20, e204. DOI: 10.2196/jmir.9779.
PubMed Web of Science ®Google Scholar
Kräuchi, K. (2002), “How Is the Circadian Rhythm of Core Body Temperature Regulated?,” Clinical Autonomic Research, 12, 147–149.
PubMed Web of Science ®Google Scholar
Krauchi, K., and Wirz-Justice, A. (1994), “Circadian Rhythm of Heat Production, Heart Rate, and Skin and Core Temperature Under Unmasking Conditions in Men,” American Journal of Physiology—Regulatory, Integrative and Comparative Physiology, 267, R819–R829. DOI: 10.1152/ajpregu.1994.267.3.R819.
PubMed Web of Science ®Google Scholar
Lal, C., Strange, C., and Bachman, D. (2012), “Neurocognitive Impairment in Obstructive Sleep Apnea,” Chest, 141, 1601–1610. DOI: 10.1378/chest.11-2214.
PubMed Web of Science ®Google Scholar
Lanfranchi, P. A., Braghiroli, A., Bosimini, E., Mazzuero, G., Colombo, R., Donner, C. F., and Giannuzzi, P. (1999), “Prognostic Value of Nocturnal Cheyne-Stokes Respiration in Chronic Heart Failure,” Circulation, 99, 1435–1440. DOI: 10.1161/01.CIR.99.11.1435.
PubMed Web of Science ®Google Scholar
Levi, F., and Schibler, U. (2007), “Circadian Rhythms: Mechanisms and Therapeutic Implications,” Annual Review of Pharmacology and Toxicology, 47, 593–628. DOI: 10.1146/annurev.pharmtox.47.120505.105208.
PubMed Web of Science ®Google Scholar
Li, Z., and Krafty, R. T. (2019), “Adaptive Bayesian Time–Frequency Analysis of Multivariate Time Series,” Journal of the American Statistical Association, 114, 453–465. DOI: 10.1080/01621459.2017.1415908.
PubMed Web of Science ®Google Scholar
Marin, J.-M., and Robert, C. (2007), Bayesian Core: A Practical Approach to Computational Bayesian Statistics, Berlin: Springer Science & Business Media.
Google Scholar
McCullagh, P., and Nelder, J. (1989), Generalized Linear Models, Monographs on Statistics and Applied Probability Series (2nd ed.), London: Chapman & Hall.
Google Scholar
Nakamura, A., Fukuda, Y., and Kuwaki, T. (2003), “Sleep Apnea and Effect of Chemostimulation on Breathing Instability in Mice,” Journal of Applied Physiology, 94, 525–532. DOI: 10.1152/japplphysiol.00226.2002.
PubMed Web of Science ®Google Scholar
Nieto, F. J., Peppard, P. E., Young, T., Finn, L., Hla, K. M., and Farré, R. (2012), “Sleep-Disordered Breathing and Cancer Mortality: Results From the Wisconsin Sleep Cohort Study,” American Journal of Respiratory and Critical Care Medicine, 186, 190–194. DOI: 10.1164/rccm.201201-0130OC.
PubMed Web of Science ®Google Scholar
Ombao, H. C., Raz, J. A., von Sachs, R., and Malow, B. A. (2001), “Automatic Statistical Analysis of Bivariate Nonstationary Time Series,” Journal of the American Statistical Association, 96, 543–560. DOI: 10.1198/016214501753168244.
Web of Science ®Google Scholar
Osorio, R. S., Ducca, E. L., Wohlleber, M. E., Tanzi, E. B., Gumb, T., Twumasi, A., Tweardy, S., Lewis, C., Fischer, E., Koushyk, V., and Cuartero-Toledo, M. (2016), “Orexin-A Is Associated With Increases in Cerebrospinal Fluid Phosphorylated-Tau in Cognitively Normal Elderly Subjects,” Sleep, 39, 1253–1260. DOI: 10.5665/sleep.5846.
PubMed Web of Science ®Google Scholar
Pace-Schott, E. F., and Hobson, J. A. (2002), “The Neurobiology of Sleep: Genetics, Cellular Physiology and Subcortical Networks,” Nature Reviews Neuroscience, 3, 591. DOI: 10.1038/nrn895.
PubMed Web of Science ®Google Scholar
Parish, J. M., Adam, T., and Facchiano, L. (2007), “Relationship of Metabolic Syndrome and Obstructive Sleep Apnea,” Journal of Clinical Sleep Medicine, 3, 467.
PubMed Web of Science ®Google Scholar
Priestley, M. B. (1981), Spectral Analysis and Time Series, London: Academic Press.
Google Scholar
Quinn, B. G. (1989), “Estimating the Number of Terms in a Sinusoidal Regression,” Journal of Time Series Analysis, 10, 71–75. DOI: 10.1111/j.1467-9892.1989.tb00016.x.
Google Scholar
Rife, D. C., and Boorstyn, R. R. (1976), “Multiple Tone Parameter Estimation From Discrete-Time Observations,” Bell System Technical Journal, 55, 1389–1410. DOI: 10.1002/j.1538-7305.1976.tb02941.x.
Google Scholar
Rissanen, J. (1978), “Modeling by Shortest Data Description,” Automatica, 14, 465–471. DOI: 10.1016/0005-1098(78)90005-5.
Web of Science ®Google Scholar
Rosen, O., Stoffer, D. S., and Wood, S. (2009), “Local Spectral Analysis via a Bayesian Mixture of Smoothing Splines,” Journal of the American Statistical Association, 104, 249–262. DOI: 10.1198/jasa.2009.0118.
Web of Science ®Google Scholar
Rosen, O., Wood, S., and Stoffer, D. S. (2012), “AdaptSPEC: Adaptive Spectral Estimation for Nonstationary Time Series,” Journal of the American Statistical Association, 107, 1575–1589. DOI: 10.1080/01621459.2012.716340.
Web of Science ®Google Scholar
Shneerson, J. M. (2009), Sleep Medicine: A Guide to Sleep and Its Disorders, Hoboken, NJ: Wiley.
Google Scholar
Shumway, R. H., and Stoffer, D. S. (2005), Time Series Analysis and Its Applications, Springer Texts in Statistics, New York: Springer-Verlag.
Google Scholar
Stoica, P., Moses, R. L., Friedlander, B., and Soderstrom, T. (1989), “Maximum Likelihood Estimation of the Parameters of Multiple Sinusoids From Noisy Measurements,” IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 378–392. DOI: 10.1109/29.21705.
Google Scholar
Sundaram, S. S., Halbower, A., Pan, Z., Robbins, K., Capocelli, K. E., Klawitter, J., Shearn, C. T., and Sokol, R. J. (2016), “Nocturnal Hypoxia-Induced Oxidative Stress Promotes Progression of Pediatric Non-alcoholic Fatty Liver Disease,” Journal of Hepatology, 65, 560–569. DOI: 10.1016/j.jhep.2016.04.010.
PubMed Web of Science ®Google Scholar
Tan, A., Cheung, Y. Y., Yin, J., Lim, W.-Y., Tan, L. W., and Lee, C.-H. (2016), “Prevalence of Sleep-Disordered Breathing in a Multiethnic Asian Population in Singapore: A Community-Based Study,” Respirology, 21, 943–950. DOI: 10.1111/resp.12747.
PubMed Web of Science ®Google Scholar
Van Someren, E. J. (2006), “Mechanisms and Functions of Coupling Between Sleep and Temperature Rhythms,” Progress in Brain Research, 153, 309–324.
PubMed Web of Science ®Google Scholar
White, G. C., and Bennetts, R. E. (1996), “Analysis of Frequency Count Data Using the Negative Binomial Distribution,” Ecology, 77, 2549–2557. DOI: 10.2307/2265753.
Web of Science ®Google Scholar
Whittle, P. (1957), “Curve and Periodogram Smoothing,” Journal of the Royal Statistical Society, Series B, 19, 38–63. DOI: 10.1111/j.2517-6161.1957.tb00242.x.
Google Scholar
Yau, S.-F., and Bresler, Y. (1993), “Maximum Likelihood Parameter Estimation of Superimposed Signals by Dynamic Programming,” IEEE Transactions on Signal Processing, 41, 804–820. DOI: 10.1109/78.193219.
Web of Science ®Google Scholar
Zhang, Q., and Wong, K. M. (1993), “Information Theoretic Criteria for the Determination of the Number of Signals in Spatially Correlated Noise,” IEEE Transactions on Signal Processing, 41, 1652–1663. DOI: 10.1109/78.212737.
Web of Science ®Google Scholar

Appendix A:

Details of the Sampling Scheme

A.1 Updating the Segment Model

A.1.1 Within-Model Move

Sampling

ω

: To obtain samples from the conditional posterior distribution

π (ω | β, σ^{2}, m, y)

(see EquationEquation (8)), we draw the frequencies one-at-time using a mixture of M-H steps. To explore the parameter space efficiently, we design a mixture distribution

q (ω_{l}^{p} | ω_{l}^{c})

, so that

(A.1)

\begin{matrix} q (ω_{l}^{p} | ω_{l}^{c}) = δ_{ω} q_{1} (ω_{l}^{p} | ω_{l}^{c}) \\ + (1 - δ_{ω}) q_{2} (ω_{l}^{p} | ω_{l}^{c}), l = 1, \dots, m, \end{matrix}

(A.1) where q₁ is defined in Equation (A.2), q₂ is the density of a univariate normal

N (ω_{l}^{c}, σ_{ω}^{2}), δ_{ω}

is a positive real number such that

0 \leq δ_{ω} \leq 1

, and c and p refer to current and proposed values, respectively. According to Equation (A.1) we carry out with probability

δ_{ω}

a M-H step with proposal distribution

q_{1} (ω_{l}^{p} | ω_{l}^{c})

(A.2)

q_{1} (ω_{l}^{p} | ω_{l}^{c}) \propto \sum_{h = 0}^{\tilde{n} - 1} I_{h} 1_{[h / n \leq ω_{l}^{p} < (h + 1) / n]},

(A.2) where

\tilde{n} = ⌊ n / 2 ⌋

and I_h is the value of the squared modulus of the discrete Fourier transform of the observations y evaluated at frequency h/n

I_{h} = {| \sum_{j = a}^{b} y_{j} exp (- i 2 π \frac{h}{n}) |}^{2} .

In this way frequencies are proposed from regions in parameter space with high posterior density, yielding a Markov chain which converges quickly to its invariant distribution. The proposal distribution $q_{1} (ω_{l}^{p} | ω_{l}^{c})$ is independent of the current state $ω_{l}^{c}$ . The acceptance probability for this move is $α = \min {1, \frac{π (ω^{p} | β, σ^{2}, m, y)}{π (ω^{c} | β, σ^{2}, m, y)} \times \frac{q_{1} (ω_{l}^{c})}{q_{1} (ω_{l}^{p})}},$ where $ω^{p} = (ω_{1}^{c}, \dots, ω_{l - 1}^{c}, ω_{l}^{p}, ω_{l + 1}^{c}, \dots, ω_{m}^{c})'$ . On the other hand, with probability $1 - δ_{ω}$ , we perform a random walk M-H step with proposal distribution $q_{2} (ω_{l}^{p} | ω_{l}^{c})$ , whose density is a univariate normal distribution with mean $ω_{l}^{c}$ and variance $σ_{ω}^{2}$ , that is, $ω_{l}^{p} | ω_{l}^{c} \sim N (ω_{l}^{c}, σ_{ω}^{2})$ . This perturbation around the current value $ω_{l}^{c}$ allows a local exploration of the conditional posterior distribution. The acceptance probability for this move is $α = \min {1, \frac{π (ω^{p} | β, σ^{2}, m, y)}{π (ω^{c} | β, σ^{2}, m, y)}} .$

Setting $δ_{ω}$ to a relative low value integrates a fairly high acceptance rate with a quick exploration of the parameter space. For our experiments, we set $σ_{ω}^{2} = {(1 / 50 n)}^{2}$ and $δ_{ω} = 0.2$ .

Sampling $β$ : Given values of $ω$ and $σ^{2}$ , the vector of linear parameters $β$ can be sampled via a M-H step from the target posterior conditional distribution $π (β | ω, σ^{2}, m, y)$ (see EquationEquation (9)(9) $\begin{matrix} π (β | ω, σ^{2}, m, y) \\ \propto exp [- \frac{1}{2 σ^{2}} \sum_{t = a}^{b} {y_{t} - x_{t} (ω)' β}^{2} - \frac{1}{2 σ_{β}^{2}} β' β], \end{matrix}$ (9) ). The proposed vector of coefficients $β^{p}$ can be drawn from normal approximations to their posterior conditional distribution (e.g., Gilks, Richardson, and Spiegelhalter Citation1995; Rosen, Wood, and Stoffer Citation2012),(A.3) $β^{p} \sim N_{2 m + 2} ({\hat{β}}^{p}, {\hat{Σ}}^{p}),$ (A.3) where ${\hat{β}}^{p} = \underset{β^{p}}{arg max} π (β^{p} | ω, σ^{2}, m, y),$ and ${\hat{Σ}}^{p} = {- \frac{\partial^{2} log π (β^{p} | ω, σ^{2}, m, y)}{\partial β^{p} \partial β^{p'}} |_{β^{p} = {\hat{β}}^{p}}}^{- 1} .$

The proposal for $β^{p}$ is independent of the current values $β^{c}$ , and the acceptance probability for this move is $α = \min {1, \frac{π (β^{p} | ω, σ^{2}, m, y)}{π (β^{c} | ω, σ^{2}, m, y)} \times \frac{q (β^{c})}{q (β^{p})}},$ where $q (β^{c})$ and $q (β^{p})$ denote the normal proposal densities $N_{2 m + 2} ({\hat{β}}^{c}, {\hat{Σ}}^{c})$ and $N_{2 m + 2} ({\hat{β}}^{p}, {\hat{Σ}}^{p})$ , respectively.

A.1.2 Between-Model Moves

The number of frequencies on a segment is proposed to either increase or decrease by one. Let $θ^{c} = (β^{c'}, ω^{c'}, σ^{2 c})'$ and assume the Markov chain is currently at $(m^{c}, θ^{c})$ . We propose a move to $(m^{p}, θ^{p})$ by drawing from a proposal density $q (m^{p}, θ^{p} | m^{c}, θ^{c})$ and accepting this update with probability(A.4) $\begin{matrix} α = \min {1, \frac{L (m^{p}, θ^{p} | y)}{L (m^{c}, θ^{c} | y)} \times \frac{π (m^{p}) π (θ^{p} | m^{p})}{π (m^{c}) π (θ^{c} | m^{c})} \\ \times \frac{q (m^{c}, θ^{c} | m^{p}, θ^{p})}{q (m^{p}, θ^{p} | m^{c}, θ^{c})}} . \end{matrix}$ (A.4)

The proposed state $(m^{p}, θ^{p})$ is drawn by first drawing $m^{p}$ , followed by $ω^{p}$ , $β^{p}$ and $σ^{2 p}$ . In fact, we can rewrite the proposal density as $\begin{matrix} q (m^{p}, θ^{p} | m^{c}, θ^{c}) = q (m^{p} | m^{c}) \times q (θ^{p} | m^{p}, m^{c}, θ^{c}) \\ = q (m^{p} | m^{c}) \times q (ω^{p} | m^{p}, m^{c}, θ^{c}) \\ \times q (β^{p} | ω^{p}, m^{p}, m^{c}, θ^{c}) \\ \times q (σ^{2 p} | β^{p}, ω^{p}, m^{p}, m^{c}, θ^{c}) . \end{matrix}$

Birth move: If a birth move is proposed, we have that $m^{p} = m^{c} + 1$ . The proposed frequency vector $ω^{p}$ is constructed as $ω^{p} = (ω_{1}^{c}, \dots, ω_{m^{c}}^{c}, ω_{m^{p}}^{p})',$ namely by keeping the current vector of frequencies and proposing an additional frequency $ω_{m^{p}}^{p}$ . Alternatively to Andrieu and Doucet (Citation1999), we choose to sample $ω_{m^{p}}^{p}$ uniformly on the interval $(0, ϕ_{ω})$ , where $0 < ϕ_{ω} < 0.5$ . The value of $ϕ_{ω}$ can be chosen to reflect prior information about the significant frequencies that drive the variation in the data, for example, by choosing $ϕ_{ω}$ in the low frequencies range ( $0 < ϕ_{ω} < 0.1$ ). Additionally, for computational and/or modelling reasons, we would like not to sample frequencies that are too close to each other. Hence, we choose to draw a candidate value $ω_{m^{p}}^{p}$ uniformly from the union of intervals of the form $[ω_{l}^{c} + ψ_{ω}, ω_{l + 1}^{c} - ψ_{ω}]$ , for $l = 0, \dots, m_{c}$ and denoting $ω_{0}^{c} = 0$ and $ω_{m^{c} + 1}^{c} = ϕ_{ω}$ . Here, $ψ_{ω}$ is a fixed minimum distance between frequencies, which is chosen larger than $\frac{1}{n}$ ; in fact, when the separation of two frequencies is less than the Nyquist step (Priestley Citation1981), that is, $| ω_{l} - ω_{l + 1} | < \frac{1}{n}$ , the two frequencies are indistinguishable (Dou and Hodgson Citation1995). Moreover, we sort the proposed vector of frequencies $ω^{p}$ to ensure identifiability and perform practical estimation, as suggested in Andrieu and Doucet (Citation1999). For proposed $ω^{p}$ and given $σ^{2 c}$ , the proposed vector of linear coefficients $β^{p}$ is sampled following the same procedure of Section A.1.1, namely we draw $β^{p}$ from a normal approximation to their posterior conditional distribution $π (β^{p} | ω^{p}, σ^{2 c}, m^{p}, y)$ . Finally, the residual variance $σ^{2 p}$ is sampled directly from its posterior conditional distribution $π (σ^{2 p} | ω^{p}, β^{p}, m^{p}, y)$ (see EquationEquation (10)(10) $\begin{matrix} σ_{| ω, β}^{2} \sim Inverse - Gamma \\ (\frac{n + ν_{0}}{2}, \frac{γ_{0} + \sum_{t = a}^{b} {y_{t} - x_{t} (ω)' β}^{2}}{2}) . \end{matrix}$ (10) ). The proposed state $(m^{p}, θ^{p})$ is accepted or reject in a M-H step with probability $\begin{matrix} α = \min {1, \frac{L (θ^{p}, m^{p} | y)}{L (θ^{c}, m^{c} | y)} \times \frac{π (m^{p}) π (θ^{p} | m^{p})}{π (m^{c}) π (θ^{c} | m^{c})} \\ \times \frac{d_{m^{p}} \cdot (\frac{1}{m^{p}}) \cdot q (β^{c}) \cdot q (σ^{2 c})}{b_{m^{c}} \cdot q (ω_{m^{p}}^{p}) \cdot q (β^{p}) \cdot q (σ^{2 p})}}, \end{matrix}$ where the likelihood function is given in EquationEquation (5)(5) $\begin{matrix} L (m_{j}, θ_{j} | y_{j}) = {(2 π σ_{j}^{2})}^{- n_{j} / 2} \\ \times exp [- \frac{1}{2 σ_{j}^{2}} \sum_{t \in I_{j}}^{} {y_{t} - x_{t} (ω_{j})' β_{j}}^{2}], \end{matrix}$ (5) , $π (m)$ is the density of the Poisson distribution truncated at $m_{max}, b_{m^{c}}$ and $d_{m^{p}}$ are defined in EquationEquation (7)(7) $b_{z} = c min {1, \frac{π (z + 1)}{π (z)}}, d_{z + 1} = c min {1, \frac{π (z)}{π (z + 1)}},$ (7) , $q (ω_{m^{p}}^{p})$ is the density of the uniform proposal for sampling the additional frequency, $q (β^{c})$ and $q (β^{p})$ are the multivariate normal proposal densities $N_{2 m^{c} + 2} ({\hat{β}}^{c}, {\hat{Σ}}^{c})$ and $N_{2 m^{p} + 2} ({\hat{β}}^{p}, {\hat{Σ}}^{p})$ , respectively; $q (σ_{p}^{2})$ and $q (σ_{c}^{2})$ are the Inverse-Gamma proposal densities defined in EquationEquation (10)(10) $\begin{matrix} σ_{| ω, β}^{2} \sim Inverse - Gamma \\ (\frac{n + ν_{0}}{2}, \frac{γ_{0} + \sum_{t = a}^{b} {y_{t} - x_{t} (ω)' β}^{2}}{2}) . \end{matrix}$ (10) .

Death move: If a death move is proposed, then $m^{p} = m^{c} - 1$ . A vector of frequencies $ω^{p}$ is constructed by randomly selecting with probability $\frac{1}{m^{c}}$ one of the current frequencies as the candidate frequency for removal. Given $ω^{p}$ and $σ^{2 c}$ , a vector of linear coefficients $β^{p}$ is sampled from a normal approximation to its posterior conditional distribution. Finally, conditioned on $ω^{p}$ and $β^{p}$ , the residual variance $σ^{2 p}$ is drawn from its posterior Inverse-Gamma distribution. It is straightforward to see that the acceptance probability for the death step has the same form as the birth step, with the proper change of labelling of the variables, and the ratio terms inverted.

A.2 Updating the Change-Point Model

A.2.1 Within-Model Move

Let $s_{(k)}^{c} = (s_{1}^{c}, \dots, s_{k}^{c})'$ be the current vector of change-points locations, $m_{(k)}^{c} = (m_{1}^{c}, \dots, m_{k + 1}^{c})'$ be the current vector of number of frequencies, $ω_{(k)}^{c} = (ω_{1}^{c'}, \dots, ω_{k + 1}^{c'})'$ be the current vector of frequencies. Let $β_{(k)}^{c} = (β_{1}^{c'}, \dots, β_{k + 1}^{c'})'$ and $σ_{(k)}^{2} = (σ_{1}^{2 c}, \dots, σ_{k + 1}^{2 c})'$ be the current vectors of linear coefficients and residual variances, respectively.

Let us also define $θ_{(k)}^{c} = (β_{(k)}^{c'}, ω_{(k)}^{c'}, σ_{(k)}^{2 c'})' .$ Following Green (Citation1995), a change-point, $s_{j}^{c}$ say, is randomly selected with probability $\frac{1}{k}$ from the existing set of change-points. To explore the parameter space in an efficient way and similar to above we construct a mixture distribution $q (s_{j}^{p} | s_{j}^{c})$ , as(A.5) $q (s_{j}^{p} | s_{j}^{c}) = δ_{s} q_{1} (s_{j}^{p} | s_{j}^{c}) + (1 - δ_{s}) q_{2} (s_{j}^{p} | s_{j}^{c}),$ (A.5) where q₁ is the density of a Uniform $[s_{j - 1}^{c} + ψ, s_{j + 1}^{c} - ψ]$ , q₂ is the density of a univariate normal $N (s_{j}^{c}, σ_{s}^{2})$ and δ_s is a positive real number such that $0 \leq δ_{s} \leq 1$ . We propose with probability δ_s a candidate value $s_{j}^{p}$ from the above uniform distribution where ψ is a fixed minimum time between change-points avoiding change-points being too close to each other. On the other hand, with probability $(1 - δ_{s}), s_{j}^{p}$ arises as a normal random walk proposal centered at the current change-point $s_{j}^{c}$ . The proposed vector of change-points locations is denoted by $s_{(k)}^{p} = (s_{1}^{c}, \dots, s_{j - 1}^{c}, s_{j}^{p}, s_{j + 1}^{c}, \dots, s_{k}^{c})',$ and hence the proposed value $s_{j}^{p}$ induces a new proposed data partition on $[s_{j - 1}^{c}, s_{j + 1}^{c}]$ corresponding to $[s_{j - 1}^{c}, s_{j}^{p})$ and $[s_{j}^{p}, s_{j + 1}^{c})$ . We denote the vectors of observations belonging to these two proposed segments as $y_{j}^{p}$ and $y_{j + 1}^{p}$ , which include $n_{j}^{p}$ and $n_{j + 1}^{p}$ observations, respectively. Given $s_{(k)}^{p}$ , the proposed number of frequencies $m_{j}^{p}, m_{j + 1}^{p}$ are set equal to the current ones $m_{j}^{c}, m_{j + 1}^{c}$ , so that $m_{(k)}^{p} = m_{(k)}^{c}$ . Similarly, the proposed pair of frequency vectors $ω_{j}^{p}, ω_{j + 1}^{p}$ is chosen equal to the current pair $ω_{j}^{c}, ω_{j + 1}^{c}$ in the corresponding segments, that is, $ω_{(k)}^{p} = ω_{(k)}^{c}$ . The proposed vectors $β_{j}^{p}, β_{j + 1}^{p}$ are sampled from normal approximations to their posterior conditional distributions $π (β_{j}^{p} | ω_{j}^{p}, σ_{j}^{2 c}, m_{j}^{p}, y_{j}^{p})$ and $π (β_{j + 1}^{p} | ω_{j + 1}^{p}, σ_{j + 1}^{2 c}, m_{j + 1}^{p}, y_{j + 1}^{p})$ and are accepted in a M-H step with probability $\begin{matrix} α = \min {1, \frac{L (k, m_{(k)}^{p}, s_{(k)}^{p}, θ_{(k)}^{p} | y)}{L (k, m_{(k)}^{c}, s_{(k)}^{c}, θ_{(k)}^{c} | y)} \\ \times \frac{π (s_{(k)}^{p} | k) π (θ_{(k)}^{p} | m_{(k)}^{p}, k)}{π (s_{(k)}^{c} | k) π (θ_{(k)}^{c} | m_{(k)}^{c}, k)} \times \frac{\prod_{h = j}^{j + 1} q (β_{h}^{c})}{\prod_{h = j}^{j + 1} q (β_{h}^{p})}}, \end{matrix}$ where the likelihood is specified in EquationEquation (4)(4) $\begin{matrix} L (k, m_{(k)}, s_{(k)}, θ_{(k)}, | y) = \prod_{j = 1}^{k + 1} L (m_{j}, θ_{j} | y_{j}), \\ y_{j} = (y_{t} : t \in I_{j}), \end{matrix}$ (4) and $q (β_{h}^{c})$ and $q (β_{h}^{p})$ are multivariate Gaussian as in Equation (A.3). Note that the likelihood ratio and the prior ratio differ from one only for the two segments affected by the move of the change-points. Finally, the residual variances $σ_{j}^{2 p}$ , $σ_{j + 1}^{2 p}$ are drawn from their posterior conditional distributions $π (σ_{j}^{2 p} | ω_{j}^{p}, β_{j}^{p}, m_{j}^{p}, y_{j}^{p}), π (σ_{j + 1}^{2 p} | ω_{j + 1}^{p}, β_{j + 1}^{p}, m_{j + 1}^{p}, y_{j + 1}^{p})$ in a Gibbs step.

A.2.2 Between-Model Moves

Let $ξ_{(k^{c})}^{c} = {(s_{(k^{c})}^{c'}, m_{(k^{c})}^{c'}, θ_{(k^{c})}^{c'})}^{'}$ and assume the Markov chain is at $(k^{c}, ξ_{(k^{c})}^{c})$ . We propose a move to $(k^{p}, ξ_{(k^{p})}^{p})$ by first drawing $k^{p}$ , followed by sampling the change-point locations $s_{(k^{p})}^{p}$ . The latter involves either merging two segments (death) or splitting a segment (birth). The number of frequencies and their values in the proposed segments are selected from the current state. We draw $β_{(k^{p})}^{p}$ and jointly update the entire state $(k^{p}, ξ_{(k^{p})}^{p})$ . Hence, we propose a move to $(k^{p}, ξ_{(k^{p})}^{p})$ by drawing from a proposal density of the form $\begin{matrix} q (k^{p}, ξ_{(k^{p})}^{p} | k^{c}, ξ_{(k^{c})}^{c}) \\ = q (k^{p} | k^{c}) \times q (ξ_{(k^{p})}^{p} | k^{p}, k^{c}, ξ_{(k^{c})}^{c}) \\ = q (k^{p} | k^{c}) \times q (s_{(k^{p})}^{p} | k^{p}, k^{c}, ξ_{(k^{c})}^{c}) \\ \times q (m_{(k^{p})}^{p}, ω_{(k^{p})}^{p} | s_{(k^{p})}^{p}, k^{p}, k^{c}, ξ_{(k^{c})}^{c}) \\ \times q (σ_{(k^{p})}^{2 p} | m_{(k^{p})}^{p}, ω_{(k^{p})}^{p}, s_{(k^{p})}^{p}, k^{p}, k^{c}, ξ_{(k^{c})}^{c}) \\ \times q (β_{(k^{p})}^{p} | σ_{(k^{p})}^{2 p}, m_{(k^{p})}^{p}, ω_{(k^{p})}^{p}, s_{(k^{p})}^{p}, k^{p}, k^{c}, ξ_{(k^{c})}^{c}) . \end{matrix}$

Birth move: If a birth move is proposed, we have that $k^{p} = k^{c} + 1$ . We draw a new change-point uniformly on $f (s_{(k^{c})}^{c}, ψ_{s})$ , the support of $s_{(k^{c})}^{c}$ given the constraints imposed by ψ_s, that is, $f (s_{(k^{c})}^{c}, ψ_{s}) = [1 + ψ_{s}, s_{1}^{c} - ψ_{s}] \cup [s_{1}^{c} + ψ_{s}, s_{2}^{c} - ψ_{s}] \cup \dots \cup [s_{k^{c}}^{c} + ψ_{s}, n - ψ_{s}] .$ Hence, the new proposed location ${\tilde{s}}_{j}$ is sampled from a $Uniform {f (s_{(k^{c})}, ψ_{s})}$ , where the proposal density is given by(A.6) $q (s_{(k^{p})}^{p} | k^{p}, k^{c}, ξ_{(k^{c})}^{c}) = \frac{1}{(n - 2 ψ_{s} (k^{c} + 1) - 1)} .$ (A.6)

As the proposed location ${\tilde{s}}_{j}$ will lie within an existing interval $(s_{j}^{c}, s_{j + 1}^{c})$ with probability one, we can define the proposed change-points location vector as $s_{(k^{p})}^{p} = (s_{1}^{c}, \dots, s_{j}^{c}, {\tilde{s}}_{j}, s_{j + 1}^{c}, \dots, s_{k^{c}}^{c})' .$

The number of frequencies $m_{j}^{p}, m_{j + 1}^{p}$ corresponding to the two newly proposed segments $[s_{j}^{c}, {\tilde{s}}_{j})$ and $[{\tilde{s}}_{j}, s_{j + 1}^{c})$ are set equal to the current number of frequencies on the whole segment $(s_{j}^{c}, s_{j + 1}^{c})$ . Therefore, we can construct the proposed vector of the number of frequencies $m_{(k^{p})}^{p}$ and the proposed vector of frequencies $ω_{(k^{p})}^{p}$ as $\begin{matrix} m_{(k^{p})}^{p} = (m_{1}^{c}, \dots, m_{j - 1}^{c}, m_{j}^{c}, m_{j}^{c}, m_{j + 1}^{c}, \dots, m_{k^{c} + 1}^{c})', \\ ω_{(k^{p})}^{p} = (ω_{1}^{c'}, \dots, ω_{j - 1}^{c'}, ω_{j}^{c'}, ω_{j}^{c'}, ω_{j + 1}^{c'}, \dots, ω_{k^{c} + 1}^{c'})' . \end{matrix}$

The proposed vector of residual variances $σ_{(k^{p})}^{2 p}$ is $σ_{(k^{p})}^{2 p} = {(σ_{1}^{2 c}, \dots, σ_{j - 1}^{2 c}, σ_{j}^{2 p}, σ_{j + 1}^{2 p}, σ_{j + 1}^{2 c}, \dots, σ_{k^{c} + 1}^{2 c})}^{'},$ where the residual variances $σ_{j}^{2 p}, σ_{j + 1}^{2 p}$ for the split partition are constructed following Green (Citation1995), namely as a perturbation of the current variance $σ_{j}^{2 c}$ . Specifically, we draw $u \sim Uniform (0, 1)$ and let $σ_{j}^{2 p}, σ_{j + 1}^{2 p}$ be deterministic transformations of $σ_{j}^{2 c}$ , that is(A.7) $σ_{j}^{2 p} = \frac{u}{1 - u} σ_{j}^{2 c}, σ_{j + 1}^{2 p} = \frac{1 - u}{u} σ_{j}^{2 c} .$ (A.7)

Finally, the proposed vector of linear coefficients $β_{(k^{p})}^{p}$ is $β_{(k^{p})}^{p} = {(β_{1}^{c'}, \dots, β_{j - 1}^{c'}, β_{j}^{p'}, β_{j + 1}^{p'}, β_{j + 1}^{c'}, \dots, β_{k^{c} + 1}^{c'})}^{'},$ where the vectors $β_{j}^{p}, β_{j + 1}^{p}$ are drawn from normal approximations of their posterior conditional distribution, as in Section A.1.1. The proposed move to the state $(k^{p}, ξ_{(k^{p})}^{p})$ is accepted with probability $\begin{matrix} α = \min {1, \frac{L (k^{p}, ξ_{(k^{p})}^{p} | y)}{L (k^{c}, ξ_{(k^{c})}^{c} | y)} \times \frac{π (k^{p}) π (ξ_{(k^{p})}^{p} | k^{p})}{π (k^{c}) π (ξ_{(k^{c})}^{c} | k^{c})} \\ \times \frac{d_{k^{p}} \cdot \frac{1}{k^{p}} \cdot \frac{1}{2} \cdot q (β_{j}^{c})}{b_{k^{c}} \cdot q (s_{(k^{p})}^{p}) \cdot \prod_{h = j}^{j + 1} q (β_{h}^{p})} \times J_{σ^{2}}}, \end{matrix}$ where the likelihood function is provided in EquationEquation (4)(4) $\begin{matrix} L (k, m_{(k)}, s_{(k)}, θ_{(k)}, | y) = \prod_{j = 1}^{k + 1} L (m_{j}, θ_{j} | y_{j}), \\ y_{j} = (y_{t} : t \in I_{j}), \end{matrix}$ (4) , $q (s_{(k^{p})}^{p})$ is the uniform density defined in Equation (A.6); $q (β_{h}^{c})$ , $q (β_{h}^{p})$ are the multivariate normal proposal densities, and the Jacobian $J_{σ^{2}}$ is $J_{σ^{2}} = | \frac{\partial (σ_{j}^{2 p}, σ_{j + 1}^{2 p})}{\partial (σ_{j}^{2 c}, u)} | = 2 {(\sqrt{σ_{j}^{2 p}} + \sqrt{σ_{j + 1}^{2 p}})}^{2} .$

The numerator of the proposal ratio is better understood by looking at the details of the death step, which are given below.

Death move: If a death step is proposed, then $k^{p} = k^{c} - 1$ . A candidate change-point $s_{j}^{c}$ to be removed is sampled uniformly from the vector of existing change-points; that is, we propose to remove $s_{j}^{c}$ with probability $\frac{1}{k^{c}}$ . Then, the proposed vector of change-points locations $s_{(k)}^{p}$ is defined as $s_{(k)}^{p} = {(s_{1}^{c}, \dots, s_{j - 1}^{c}, s_{j + 1}^{c}, \dots, s_{k^{c}}^{c})}^{'} .$

The number $m_{j}^{p}$ and the vector of relevant frequencies $ω_{j}^{p}$ of the newly merged segment $[s_{j - 1}^{c}, s_{j + 1}^{c})$ are selected by drawing an index at random from ${j, j + 1}$ , obtaining say $j^{*}$ , and setting the proposed parameters equal to the current ones relative to the selected index. That is, we set $m_{j}^{p} = m_{j^{*}}^{c}$ and $ω_{j}^{p} = ω_{j^{*}}^{c}$ . Hence, the proposed vectors of number of frequencies $m_{(k^{p})}^{p}$ and their values $ω_{(k^{p})}^{p}$ are constructed as follows $\begin{matrix} m_{(k^{p})}^{p} = {(m_{1}^{c}, \dots, m_{j - 1}^{c}, m_{j}^{p}, m_{j + 2}^{c}, \dots, m_{k^{c} + 1}^{c})}^{'}, \\ ω_{(k^{p)}}^{p} = {(ω_{1}^{c'}, \dots, ω_{j - 1}^{c'}, ω_{j}^{p'}, ω_{j + 2}^{c'}, \dots, ω_{k^{c} + 1}^{c'})}^{'} . \end{matrix}$

The residual variance $σ_{j}^{2 p}$ of the newly merged segment is obtained by inverting the transformation of Equation (A.7). Specifically, we construct $σ_{j}^{2 p} = \sqrt{σ_{j}^{2 c} σ_{j + 1}^{2 c}}$ and set the proposed vector of residual variances $σ_{(k^{p})}^{2 p}$ as $σ_{(k^{p})}^{2 p} = {(σ_{1}^{2 c}, \dots, σ_{j - 1}^{2 c}, σ_{j}^{2 p}, σ_{j + 2}^{2 c}, \dots, σ_{k^{c} + 1}^{2 c})}^{'} .$

The proposed vector of linear coefficients $β_{(k^{p})}^{p}$ is $β_{(k^{p})}^{p} = {(β_{1}^{c'}, \dots, β_{j - 1}^{c'}, β_{j}^{p'}, β_{j + 2}^{c'}, \dots, β_{k^{c} + 1}^{c'})}^{'},$ where the vector of coefficients $β_{j}^{p}$ is drawn from normal approximation to its posterior conditional distribution. The acceptance probability for the death step has the same form of the birth step, with the proper change of labelling of the variables, and the ratio terms inverted.

Bayesian Model Search for Nonstationary Periodic Time Series