549
Views
15
CrossRef citations to date
0
Altmetric
Articles

Simple stochastic simulation of time irreversible and reversible processes

ORCID Icon
Pages 536-551 | Received 23 Jun 2019, Accepted 27 Nov 2019, Published online: 28 Jan 2020

ABSTRACT

As time irreversibility of streamflow is marked for time scales up to several days, while common stochastic generation methods are good only for time-symmetric processes, the need for new methods to handle irreversibility, particularly in flood simulations, has been recently highlighted. From an investigation of the historical evolution of existing stochastic generation methods, which is a useful step before proposing new methods, the strengths and weaknesses of current approaches are located. Following this investigation, a generic solution to the stochastic generation problem is proposed. This is an analytical exact method based on an asymmetric moving-average scheme, capable of handling time irreversibility in addition to preserving the second-order stochastic structure, as well as higher-order marginal statistics, of a process. The method is studied theoretically in its general setting, as well as in its most interesting special cases, and is successfully applied to streamflow generation at an hourly scale.

Editor A. CastellarinAssociate editor not assigned

1 Introduction

Simplicity is the final achievement.

Frédéric Chopin

A recent study (Koutsoyiannis Citation2019b) has investigated the possible existence of time irreversibility in hydrometeorological processes and the related theoretical framework. According to the latter, a simple definition of time reversibility within stochastics is the following, where underlined symbols denote stochastic (random) variables and non-underlined ones denote values thereof or regular variables.

A stochastic process x_(t) at continuous time t, with nth order distribution function:

(1) F(x1,x2,,xn;t1,t2,,tn):=P{x_(t1)x1,x_(t2)x2,,x_(tn)xn}(1)

is time-symmetric or time-reversible if its joint distribution does not change after reflection of time about the origin, i.e. if for any n, t1,t2,,tn,

(2) Fx1,x2,, xn;t1,t2,,tn=Fx1,x2,, xn;t1,t2,,tn(2)

If times ti are equidistant, i.e. titi1=D, the definition can be also written by reflecting the order of points in time, i.e.:

(3) Fx1,x2,, xn1,xn;t1,t2,, tn1,tn=Fx1,x2,, xn1,xn;tn,tn1,, t2,t1(3)

A process that is not time-reversible is called time-asymmetric, time-irreversible or time-directional. Important results related to time (ir)reversibility are the following:

  • A time-reversible process is also stationary (Lawrance Citation1991).

  • If a scalar process x_(t) is Gaussian (i.e. all its finite-dimensional distributions are multivariate normal) then it is reversible (Weiss Citation1975). The consequences are: (a) a directional process cannot be Gaussian; (b) a discrete-time ARMA process (and a continuous-time Markov process) is reversible if and only if it is Gaussian.

  • However, a vector (multivariate) process can be Gaussian and irreversible at the same time. A multivariate Gaussian linear process is reversible if and only if its autocovariance matrices are all symmetric (Tong and Zhang Citation2005).

Time asymmetry of a process can be studied through the differenced process, i.e.:

(4) x˜_τ:=x_τx_τ1(4)

where x_τ is the process representation in discrete time τ (denoting the continuous-time interval τ1D,τD), i.e.:

(5) x_τ:=1D(τ1)DτDx_(u)du(5)

For a stationary stochastic process x_(t), the differenced process x˜_τ has mean zero and variance:

(6) γ˜1:=varx˜_τ=2γ1c1(6)

where γ1 and c1 are the variance and lag-one autocovariance, respectively, of x_τ. Then, the time asymmetry is quantified by the skewness coefficient:

(7) C˜S:=μ3x˜_τvarx˜_τ3/2(7)

where μ3  denotes the third moment. Alternatively, it is quantified by the probability of x˜_τ being negative:

(8) P˜N:=Px˜_τ0=F˜x˜_τ0(8)

Processes with large C˜S or large departure of P˜N from the value 1/2 signify high (positive) time irreversibility. A sketch of a realization of such a process is given in . The graph is characterized by rapid increases followed by gradual decreases. The entire setting in constructing the time series of is nonlinear (the values are obtained by optimization rather than generated by a linear stochastic model), and while the marginal distribution is by construction Gaussian, the multivariate one is not, so there is no conflict with Weiss’s (Citation1975) result mentioned above.

The study by Koutsoyiannis (Citation2019b) has provided evidence that time irreversibility may be profound in the streamflow process at scales of several days or lower. In atmospheric processes irreversibility appears as well, but at much finer time scales. On the other hand, most stochastic simulation methods do not account for irreversibility. In the same study, two methods have been proposed for a stochastic generation that respects irreversibility and reproduces it in a controlled manner, quantifying it by the skewness of the differenced process. After exploring them, one of the two methods proved successful. A disadvantage of that method is that it needs iterations to converge to a solution, in which vector derivatives need to be evaluated.

Figure 1. Plot of two synthetic time series generated by maximizing time irreversibility properties of a process restricted to be marginally Gaussian (N(3, 1)) with lag-one autocorrelation 0.5, so that the variance of the differenced process is also 1 (equal to that of the original process). Solution 1 maximizes the skewness of the differenced process. Solution 2 maximizes the frequency that the differenced process has a negative value, without taking into account the skewness. In both series the frequency that the differenced process has negative values is 0.94. The coefficients of skewness of the differenced processes for series 1 and 2 are 4.10 and 3.34, respectively (Source: Koutsoyiannis Citation2019b).

Figure 1. Plot of two synthetic time series generated by maximizing time irreversibility properties of a process restricted to be marginally Gaussian (N(3, 1)) with lag-one autocorrelation 0.5, so that the variance of the differenced process is also 1 (equal to that of the original process). Solution 1 maximizes the skewness of the differenced process. Solution 2 maximizes the frequency that the differenced process has a negative value, without taking into account the skewness. In both series the frequency that the differenced process has negative values is 0.94. The coefficients of skewness of the differenced processes for series 1 and 2 are 4.10 and 3.34, respectively (Source: Koutsoyiannis Citation2019b).

Here we present a simple method that is exact, analytical and direct, and its application is fast avoiding the need for iterations. As far as the general framework related to time’s arrow and time irreversibility of atmospheric and hydrological processes is concerned, the interested reader is referred to the study by Koutsoyiannis (Citation2019b) and its extended list of references. However, in terms of the stochastic synthesis per se, the current paper is a stand-alone study with a detailed presentation, starting from main historical landmarks and ending with proposing a generic method of stochastic synthesis which is entirely new (to the author’s knowledge). In this manner, an earlier study for a generic and parsimonious stochastic modelling scheme (Koutsoyiannis Citation2016) is complemented and extended to time-irreversible processes.

2 Approaches in stochastic synthesis

It is common knowledge that when we want to model the (marginal) distribution of a stochastic variable, we usually fit a model (a distribution such as Gaussian, gamma, generalized extreme value, etc.) using the available data, typically given as a time series xn,n=1,,N. But let us consider for a minute a different tactic: We estimate from the data sufficiently many, say J, moments, and instead of fitting a customary distribution, we approximate the distribution through moments. These moments are μj´:=E[x_j],j=1,2,,J, and their unbiased estimates from the data are μˆj:=n=1Nxnj/N. From the moments we could recover the density function fx of the stochastic variable of interest x_, in at least two ways:

1.We could exploit the inversion formula (e.g. Papoulis Citation1991, p. 116):

(9) fx=12πφωeiωxdω(9)

where φω is the characteristic function of the distribution and i:=1 is the imaginary unit, combine it with the moment theorem, according to which the jth derivative of φω is:

(10) φjω=ijμj(10)

and then use some type of expansion to approximate φω and eventually fx based on a number J of moment estimates.

2.We could use as an approximation of fx the maximum entropy distribution conditional on the specified moments, which is (e.g. Papoulis Citation1991, p. 571):

(11) fx=expj=0Jλjxj(11)

where λj are constants determined so that the distribution have the specified moments.

This tactic, if implemented in either variant, would have problems, including the following:

  • Assuming that the number J is not very small, the resulting models would not be parsimonious, as they would include J moments and an equal number of parameters.

  • The moment estimates beyond second are unreliable for typical sample sizes, despite the fact that unbiasedness is theoretically guaranteed (see the work by Lombardo et al. Citation2014, with the characteristic title “Just two moments”.)

  • In particular, heavy-tailed distributions would never appear as models, because in these distributions all moments beyond a threshold are infinite, while their sample estimates are finite.

The origin of problems in such a tactic is not actually the approximate character of the specified distribution – after all, any model is an approximation of some real phenomenon. Rather, it is the lack of parsimony and the direct link of the parameters (such as λj in (11)) with the data, on which their estimation is based. In contrast, the same approximation of the distribution would be valid and not problematic if coefficients λj were not model parameters and if they were determined theoretically rather than estimated from the data.

Fortunately, however, this tactic is not commonly implemented. Rather, a parametric model is assumed based, on the one hand, on some theoretical considerations (e.g. central limit theorem, maximum entropy principle, asymptotic theory of distribution of extremes) and on the other hand, on some exploration of the available data.

Strangely enough though, this tactic has been the rule in fitting and applying so-called time-series models – and it is still the norm even today – despite the fact that fitting a stochastic process is more demanding a task than fitting a marginal distribution and, at the same time, the presence of autocovariance increases uncertainty in model fitting. Indeed, time-series models are typically fitted based on estimates of the autocovariance (or autocorrelation) function and even the model formulation (in terms of the multivariate distribution) is determined by the number of the autocorrelation coefficients they are used – similar to what Eq. (11) suggests for the marginal distribution. Apparently, the alternative of fitting a parametric model to the multivariate distribution is valid (Koutsoyiannis Citation2000, Citation2016) but it has not been the norm. The reasons for following such a weird practice in most disciplines whose modelling relies on data, including hydrology, are historical. An attempt to trace their evolution is contained in the next section.

3 A historical overview of stochastic modelling

Current stochastic modelling practices have been influenced by at least three Schools of thought, which will be referred to as (a) the Stochastic School, (b) the Time Series School and (c) the Monte Carlo School. We attempt to trace out the evolution in each of them in the next subsections, while an individual subsection is devoted to hydrology.

3.1 The Stochastic School

Perhaps Ludwig Boltzmann (1844–1906, Universities of Graz and Vienna, Austria, and Munich, Germany) was the first to introduce fundamental concepts related to stochastic process. In particular, Boltzmann (Citation1877) explained the concept of entropy in probability theoretic context and Boltzmann (Citation1884/85) introduced the notion of ergodic systems,Footnote1 which is central in stochastics. Later George D. Birkhoff (1884–1944; Princeton, Harvard, USA) discovered the ergodic theorem (Birkhoff Citation1931) also known as the Birkhoff–Khinchin theorem to give credit to Aleksandr Khinchin (also spelt Khintchine, 1894–1959; Moscow State University, Russia), who gave a purely measure-theoretic proof of the theorem (Khintchine Citation1933). In addition, Khintchine (Citation1934) defined the stationary stochastic processes and introduced a probabilistic setting of the Wiener-Khinchin theorem relating autocovariance and power spectrum.

But it was Andrey N. Kolmogorov (1903–1987; Moscow State University, Russia), the great scientist who founded the stochastic theory. Kolmogorov (Citation1931) introduced the term stochastic process,Footnote2 identifying process with change. He also used the term stationary to describe a probability density function that is unchanged in time. Kolmogorov (Citation1933) defined probability (founded on measure theory) in an axiomatic manner based on three fundamental concepts (a triplet called probability space) and four axioms (non-negativity: normalization, additivity and continuity at zero). Kolmogorov (Citation1937, Citation1938) gave a concise presentation of the concept of a stationary stochastic process and Kolmogorov (Citation1947) defined wide-sense stationarity.

The concept of stationarity (and its negation, nonstationarity) has a central role in stochastics yet is widely misunderstood and broadly misused (Montanari and Koutsoyiannis Citation2014, Koutsoyiannis and Montanari Citation2015). Stationarity is tightly connected to ergodicity, which in turn is a prerequisite to make inference from data, that is, induction. If a system that is modelled in a stochastic framework has deterministic dynamics then it is stationary if it is ergodic and vice versa (Mackey Citation2003, p. 52). Likewise, a nonstationary system is also non-ergodic and vice versa. If the system dynamics is stochastic, then ergodicity and stationarity do not necessarily coincide. However, recalling that a stochastic process is a model (not part of the real world), we can always devise a stationary stochastic process that is ergodic.

It may be useful to include here a reminder of the definitions of the two concepts. Following Kolmogorov (Citation1931, Citation1938) and Khintchine (Citation1934), a process is stationary if its statistical properties are invariant to a shift of time origin, i.e. the processes x_t and x_t have the same (multivariate) distribution for any t and . Based on the ergodic theorem (Birkhoff Citation1931, Khintchine Citation1933; see also Mackey Citation2003, p. 54), a stochastic process x_t is ergodic if the time average of any (integrable) function g(x(t)), as time tends to infinity, equals the true (ensemble) expectation, i.e.:

(12) limT1T0Tg(x_(t))dt=E[g(x_(t))](12)

3.2 The Time Series School

Contrary to the theoretical Stochastic School, the Time Series School is more intuitive and empirical, and less rigorous. Its history was written more by economists and less by theoretical mathematicians. Perhaps the first who introduced time-series analysis was the American economist W.M. Persons. In studying the problem “When to buy or sell”, Persons (Citation1919) introduced the study of time series, which he called statistical series, and asserted that they “result from the combination of four elements: secular trend, seasonal variation, cyclical fluctuation, and a residual factor.” He also proposed methods for “Eliminating secular trends” and “Eliminating seasonal variation”. A definition of a time series was given 10 years later by the American statistician Bailey (Citation1929): “A time series is a series of observations taken at different times and recorded with the time at which they were taken.”

Interestingly, Slutsky (Citation1927) demonstrated that what Persons (and other economists) regarded as a cyclical component is nothing but a statistical artefact with no essential meaning (see e.g. Kyun and Kim Citation2006, Barnett Citation2006). Subsequently, the notion of a cyclical component was abandoned but the decomposition of a time series to the remaining three components, trends, seasonal variation and residuals is popular even today.

Yule (Citation1927) and Walker (Citation1931) (both British statisticians), starting from an analysis of sunspot numbers, studied autoregressive processes and in particular their periodogram and autocorrelation properties. But the biggest progress in the Time Series School was made in Uppsala by H.O.A. Wold (a Norwegian-born econometrician and statistician with a career in Sweden) and P. Whittle (a New-Zealand-born mathematician and statistician), who, in their doctoral theses, provided the stochastic foundation of time-series analysis. Wold (Citation1938, Citation1948) proved that a stochastic process (even though he referred to it as a time series) can be decomposed into a regular process (i.e. a process linearly equivalent to a white noise process) and a predictable process (i.e. a process that can be expressed in terms of its past values). Whittle (Citation1951, Citation1952, Citation1953) laid the mathematical foundation of autoregressive and moving-average models in univariate and multivariate setting. Later, in their influential book, Box and Jenkins (Citation1970) named these models with acronyms such as AR(p) (autoregressive models of order p), MA(q) (moving average models of order q), ARMA(p,q) (autoregressive moving-average models) and ARIMA(p,d,q) (autoregressive integrated moving-average models). These became popular under these names and have also been known as Box–Jenkins models (cf. Stigler’s law of eponymy; Stigler Citation2002). A useful extension of these models to apply to processes with long-range dependence was proposed by Hosking (Citation1981). This is done by replacing the integer parameter d in ARIMA(p,d,q) with a real one (fractional differencing) and the models are usually termed ARFIMA(p,d,q).

Despite the wider influence of the Time Series School over the Stochastic School, there are several problems with the former. First, the term time series is ambiguous, sometimes denoting a series of observations, as in the original definition of Bailey (Citation1929) (or, equivalently, a realization of a stochastic process), and at other times denoting the stochastic process per se, as in the aforementioned use by Wold. We emphasize that here the term time series is used with the first meaning, a series of numbers, while for a series of random variables we use the term stochastic process. Second, with the exception of the simplest models of these families, such as the AR(1) and ARMA(1,1), time-series models are too artificial because, being complicated discrete-time models, they do not necessarily correspond to a continuous-time process, while natural processes typically evolve in continuous time. Furthermore, their identification, typically based on the estimation of the autocorrelation function from data, usually neglects estimation bias and uncertainty, which in stochastic processes (as opposed to purely random processes) are often tremendous (Lombardo et al. Citation2014).

Indeed, from their onset, time-series models have been tightly associated with a large number of parameters and they usually become over-parameterized and thus not parsimonious. These parameters are estimated from data, which usually are too few to support a reliable estimation. In Whittle’s (Citation1952) words (emphasis added):

There is, of course, nothing special with the autoregressive scheme; we could equally well graduate with a high-order moving average, and there are many other possibilities. […] In practice, however, the autoregressive graduation has the advantage that the estimated residual sum of squares can be written down directly in terms of the observations […] without the need to solve explicitly for the estimates of the ‘a’ coefficients.

It is, of course, not possible to estimate an infinite number of parameters from a finite sample, but the series of a coefficients must converge, and by considering sufficiently many coefficients we should be able to obtain an arbitrarily good approximation to the real process.

This is similar to fitting a distribution function using observations to estimate many moments as described in Sec. 2.

The decomposition of a time series to components, trends, seasonal variation and residuals is fundamentally problematic, despite being popular. (In particular trend analysis of hydroclimatic processes is more fashionable today than ever before.) However, it should be noted that a meaningful definition of a trend has never been given. Also, for some, including this author, it is hardly conceivable how time per se could be regarded as an explanatory variable in a complex process and what the logical basis is in expressing the statistics of a process as a deterministic function of time. Accumulation of data series with long time spans is revealing that what have been regarded as trends are mostly parts of long-term fluctuations (and in accordance with Slutsky’s work, they could also be regarded as statistical artefacts). Hurst’s (Citation1951) and Kolmogorov’s (Citation1940) works provide a scientific basis to model what has been regarded as trends in the context of stationary stochastic processes. Finally, “deseasonalization” (in Persons’s original terminology “Eliminating seasonal variation”) is a delusion; we can hardly remove seasonality in the multivariate distribution of a stochastic process (what we typically do is in the marginal distribution).

3.3 The Monte Carlo School

The Monte Carlo method is essentially a numerical technique for solving deterministic integrodifferential equations using random sampling. Ideas that could be classified as implementation of the Monte Carlo method are old. Georges Louis LeClerc (Comte de Buffon, French scientist; 1707–1788) became famous for “Buffon’s needle,” a method using needle tosses onto a lined background to estimate π (where, if the line distance is equal to needle length, π is found as twice the inverse of probability that the needle crosses a line).Footnote3 Random sampling is also old. Galton (Citation1890) invented a set of three modified dice to generate samples from a normal distribution. “Student” (the pseudonym of W.S. Gosset), in 1908, performed simulation experiments using 3000 cards (in 750 groups of size 4) to find the distribution of the t-statistic and of the correlation coefficient. Tippett (Citation1927) published a table of random numbers: he took 41 600 digits at random from Census Reports and combined them by fours to give 10 400 numbers. Mahalanobis (Citation1934) published tables of random numbers from normal distribution (208 sets of 50 numbers each). (See more information in Stigler Citation2002.)

The modern Monte Carlo method was conceived by Stanislaw Ulam in 1946.Footnote4 Soon after the method grew to solve neutron diffusion problems by himself and other great mathematicians and physicists in Los Alamos (John von Neumann, Nicholas Metropolis, Enrico Fermi), and was first implemented on the ENIAC computer (Metropolis Citation1989, Eckhardt Citation1989). The “official” history of the method began in 1949 with the publication of a paper by Metropolis and Ulam (Citation1949). One prominent characteristic of the Monte Carlo method as a numerical integration method is that it does not suffer from the well-known curse of dimensionality, unlike the classical (grid-based) numerical integration method, which does suffer. Thus, it is easily shown (e.g. Metropolis and Ulam Citation1949, Niederreiter Citation1992) that for a number of dimensions d > 4, the Monte Carlo integration method (in which the function evaluation points are taken at random) is more accurate (for the same total number of evaluation points) than classical numerical integration, based on a grid representation of the integral space. For large dimensionality, e.g. for d > 20, the classical method is infeasible while the Monte Carlo method is always feasible.

3.4 The onset of stochastic modelling in hydrology

Techniques that could be classified as applications of the Monte Carlo method had appeared in the hydrological literature much earlier than the “official start” of the method in 1949. Hazen (Citation1914), made a pioneering study in which he introduced the reservoir storage-yield-reliability relationship, a concept that would remain unexploited in the western hydrological literature yet constituting the scientific basis of modern reservoir design (Klemes Citation1987). In that study, he proposed an empirical simulation technique and formed a synthetic time series by combining historical flow records of different rivers “spliced” sequentially together. Sudler (Citation1927) extended the work of Hazen by resampling from a sequence of historical river flows using cards, which he shuffled to form new sequences of data. Obviously, this method heavily distorts the time dependence of river flows whose importance was not known at that time. For it was Hurst (Citation1951) who understood that importance, along with the omnipresence in natural processes of a clustering behaviour of similar events in time, a behaviour that is now called long-term persistence (LTP) or long-range dependence (LRD). It is also known as the Hurst-Kolmogorov behaviour, to give also credit to Kolmogorov (Citation1940), who had proposed a mathematical model representing that behaviour before it was verified in natural processes by Hurst. In his attempt to compare natural and random events, Hurst performed physical experiments to generate random numbers. Specifically, he tossed 10 coins (sixpences) simultaneously and repeated this 1025 times (note that 10 binary digits are equivalent to about 3 decimal digits). As he notes, his rate was 100 random numbers per 35 min (while that would be of the order of a microsecond in modern computer environments, even slow ones). He also used another method, shuffling and cutting a pack of 52 cards, in which he improved the rate to 100 random numbers per 20 min.

It would take decades before the hydrological community assimilate Hurst’s discovery of LTP (O’Connell et al. Citation2016), even though some of the pioneering studies appeared towards the end of the 1960s (Mandelbrot and Wallis Citation1969). The initial studies implementing primitive variants of stochastic simulation did not reproduce LTP. Barnes (Citation1954), in designing a reservoir in Australia, used a table of random numbers from normal distribution to generate a 1000-year sequence of synthetic annual data. Thomas and Fiering (Citation1962) generated flows correlated in time, but using only the lag one autocorrelation, obviously neglecting LTP. Beard (Citation1965) and Matalas (Citation1967) generated concurrent flows at several sites. Chow (Citation1969), and Chow and Kareliotis (Citation1970) systematized the use of time-series models (in particular – and using their terminology – moving-average models, sum of harmonics models and autoregression models) and highlighted their value in the economic planning of water supply and irrigation projects. It is evident from the above pioneering studies, as well as of subsequent myriads of studies, that hydrologists have followed (and today still do) the Time Series School rather than the more rigorous Stochastic School.

4 Premises

Before we proceed to the formulation of the simulation scheme, it is necessary to list the main premises on which the scheme is based. These are consistent with the generic procedure in Koutsoyiannis (Citation2016) and are summarized in the following points:

  1. A stochastic model is always required for any stochastic task, such as estimation, testing and synthesis. In stochastics, there cannot be model-free, also known as nonparametric, methods. Nonparametric methods are a delusion caused by an adherence to the classical statistical assumption of independence which, when dealing with natural processes is the ultimate – and most inappropriate – parameterization per se. As an example, Kendall’s rank correlation test (Kendall Citation1938), or its variant for trend detection (Mann Citation1945), is an important and widely used statistical test that is regarded as nonparametric. And indeed it is nonparametric if it is applied on samples that are independent by construction, i.e. obtained by individual and independent experiments. But if the data are time series (consecutive observations in time of a single process) then application of the test needs parameterization of the dependence structure. As usually applied, the test includes in its null hypothesis the sub-hypothesis that autocorrelations for all lags are zero. Thus, the rejection of the null hypothesis could be regarded as rejection of independence, even though the popular interpretation favours the existence of a trend. There exist of course consistent parametric variants of the test, which take the stochastic structure into account (e.g. Hamed Citation2008).

  2. As natural time runs continuously, the model needs to be formulated for continuous time to avoid the risk of making artificial constructs. The discrete-time representation, which is certainly necessary in simulation, should be derived from the continuous time one. Second-order stochastic tools, such as autocovariance and power spectrum, are affected by discretization and the effect should always be accounted for. The climacogram,Footnote5 i.e. the variance γk of the process averaged at time scale k, is an exception, as it is not affected by discretization, and has some additional advantages (Dimitriadis and Koutsoyiannis Citation2015) which make it the preferable tool in stochastic modelling.

  3. The model needs to be parsimonious in order to be useful (Gauch Citation2003). Inflationary models, while giving an impression of a good fit, in fact entail (often hidden) high uncertainty. An example of a (parametric) parsimonious model is the Filtered Hurst-Kolmogorov process with a generalized Cauchy-type climacogram (FHK-C; Koutsoyiannis Citation2016, Citation2017). This is defined through its climacogram:

    (13) γk=λ1+kα2MH1M(13)
    where α and λ are scale parameters with dimensions of t and x2, respectively, while M (fractal parameter) and H (Hurst parameter) are dimensionless determining the dependence structure at a local (smoothness or fractality) and global (LRD) level, respectively. Both M and H take on values in (0,1), with M < ½ or > ½ indicating a rough or a smooth process, respectively, and with H < ½ or > ½ indicating an antipersistent or a persistent process, respectively. It should be stressed that the quantities α, λ, M and H in the above expression (plus the process mean, and perhaps a shape parameter of the marginal distribution and a parameter quantifying irreversibility) are the only model parameters. Any other quantity that would emerge in the generation phase should be theoretically calculated (as internal coefficients of the generation scheme) from these parameters and no longer estimated from data. If the data cannot support the estimation of four parameters of Eq. (13) for the dependence structure, simplifications could be made. For example, assuming M=H=1/2 we obtain a two-parameter near-Markov model while with M=1/2 and α0 we obtain a two-parameter Hurst–Kolmogorov model. Examples of models additional to Eq. (13) can be found in Koutsoyiannis (Citation2017).

  1. Parameter estimation needs to consider statistical bias, which is present in all estimators (Koutsoyiannis Citation2016), including very simple and common ones, such as variance and correlation (a known exception is the mean whose standard estimator is not affected by bias in the presence of dependence; higher order noncentral moments are also unbiased in theory, but suffer from unknowability; Koutsoyiannis Citation2019a).

  2. Awareness of stochastics (the mathematics of stochastic variables and processes), theoretical consistency and logical rigour are always necessary to avoid misleading or erroneous calculations, results and interpretations.

5 Simulation method

Let x_t be a stationary stochastic process representing the instantaneous quantity of a certain hydrological or other physical process in continuous time t. For simplicity and without loss of generality we assume that it has zero mean and (instantaneous) variance:

(14) γ0:=var[x_(t)](14)

which is arguably (Koutsoyiannis Citation2017) assumed to be finite. The discrete-time representation of the process (Eq. (5)) can also be written as:

(15) x_τ:=1D(τ1)DτDx_ (u)du=X_(τD)X_((τ1)D)D(15)

where

(16) X_t:=0tx_ξdξ(16)

is the cumulative process, which, if x_t aims to represent a natural processes, should necessarily be nonstationary. Its variance at time t, known as cumulative climacogram, is:

(17) Γ(t):=var[X_(t)]=t2γ(t)(17)

where γ() is the variance of the averaged process. This latter, as a function of time scale k, is called climacogram:

(18) γ(k):=var [X_(k)k](18)

The autocovariance function c(h) of the continuous-time process x_t for time lag h is (Koutsoyiannis Citation2016):

(19) ch:=covx_t, x_t+h=12 d2Γhdh2(19)

and the power spectrum s(w) for frequency w is the Fourier transform of the autocovariance function, i.e.:

(20) sw:=40chcos2πwhdh, ch=0swcos2πwhdw(20)

The discrete-time autocovariance for integer time lag η is (Koutsoyiannis Citation2016):

(21) cη=1D2(Γ(|η+1|D)+Γ(|η1|D)2Γ(|η|D))(21)

and is related to the discrete-time power spectrum by

(22) sdω=2c0+4η=1cηcos2πηω,cη=01/2sdωcos2πωηdω(22)

where ω:=wD is dimensionless frequency. The power spectrum has some analogies with another stochastic tool, the so-called climacospectrum, which is directly given in terms of the climacogram. Specifically, it is proportional to the difference of the variances of the averaged process at time scales k and 2k:

(23) ζk:=kγkγ2kln2(23)

This tool is useful in the model fitting phase (see Sec. 7).

To simulate x_τ we use the generalized moving-average scheme (Koutsoyiannis Citation2000):

(24) x_τ=j=JJajv_τj(24)

where aj are weights to be calculated from the autocovariance function, v_j is white noise averaged in discrete-time (and not necessarily Gaussian), also known as innovation process, and J is theoretically infinite, so that in all theoretical calculations we will assume J=, while in the generation J is a large integer chosen so that the resulting truncation error be negligible. Here we stress that the above scheme is just the contrary to that suggested by Whittle (Citation1952) in the quotation given in Sec. 3.2. Specifically, (a) we use a purely moving-average scheme instead of the autoregressive scheme favoured by Whittle and (b) we do not relate our scheme with observations, as the observations have been already used in the model fitting phase, which is totally isolated from the generation scheme. Since we do not use the observations, the moving-average scheme has many advantages in calculations.

Writing Eq. (24) for x_τ+η, multiplying it with (24) and taking expected values we find the convolution expression for J=:

(25) cη=l=alaη+l(25)

We need to find the sequence of aη,η=,1,0,1,, so that (25) holds true. A known solution (Koutsoyiannis Citation2000) is the symmetric moving average (SMA) scheme in which aη=aη. Nonetheless, the problem has infinitely many non-symmetric solutions, which are asymmetric moving average (AMA) schemes. A more common one is the ordinary backward asymmetric moving average (OBAMA) scheme in which aη=0 for any η<0; this latter is typically formulated in a different manner and denoted as simply moving average – MA, but as we study a richer family of schemes, the distinct name OBAMA is necessary.

Here, the following generic solution of the AMA scheme, giving the coefficients aη, is proposed:

(26) aη=1/21/2e2πiθωηωARωdω(26)

where θω is any (arbitrary) odd real function (meaning θω=θω) and

(27) ARω:=2sdω(27)

We prove in Appendix A that the sequence of aη:

  1. consists of real numbers, despite the expression in Eq. (26) involving complex numbers;

  2. satisfies precisely Eq. (25); and

  3. is easy and fast to calculate using the fast Fourier transform (FFT).

We can now convert this theoretical result into a numerical algorithm. This consists of the following steps:

(i) From the continuous-time stochastic model, expressed through its climacogram γk, we calculate its autocovariance function in discrete time (assuming time step D):

(28) cη=η+12γη+1D+η12γ|η1|D2η2γjD(28)
(This step is obviously omitted if the model is already expressed in discrete time through its autocovariance function.)

(ii) We choose an appropriate number of coefficients J that is a power of 2 and perform inverse FFT (using common software) to calculate the discrete-time power spectrum and the frequency function ARω for an array of ωj=jw1,\breakj=0,1,,J,w1:=1/JD:

(29) sdωj=2c0+4η=1Jcηcos2πηωj, ARωj=2sdωj(29)

(iii) We choose θω (see below) and we form the arrays (vectors) AR and AI, both of size 2J indexed as 0,,2J1, with the superscripts R and I standing for the real and imaginary part of a vector of complex numbers, respectively:

(30) ARj=ARωjcos2πθωj/2,j=0,  ,JAR2Jj,j=J+1,  ,2J1(30)
(31) AIj=ARωjsin2πθωj/2,j=0,  ,J10j=JAI2Jj.j=J+1,  ,2J1(31)

(iv) We perform FFT on vector AR+iAI (using common software), and get the real part of the result for j=0,,J, which is precisely the sequence of aη.

We note that by choosing J as a power of 2, the vectors AR and AI will have size 2J which is also a power of 2, thus achieving maximum speed in the FFT calculations. More details are contained in a supplementary file which includes numerical examples along with the simple code needed to do these calculations on a spreadsheet.

It may be useful to note the following additional points about the method:

  • Equation (26) gives not a single solution, but a variety of infinitely many ones, all of which preserve exactly the second-order characteristics of the process.

  • A particular solution is characterized by the chosen function θω.

  • Even assuming θω=θ0sign ω with constant θ0, again there are infinitely many solutions.

  • The availability of infinitely many solutions enables the preservation of additional statistics, e.g. those related to time asymmetry.

  • In addition, we always have several options related to the distribution of the white noise v_τ, which in general is not Gaussian, thus enabling preservation of moments of any order. A general scheme for preserving moments of any order has been outlined in Koutsoyiannis (Citation2019a), while successful applications for moments up to fourth order with SMA have been presented in Dimitriadis and Koutsoyiannis (Citation2018). It should be stressed that in typical sample sizes, high-order moments should be evaluated theoretically through the distributional parameters rather than estimated from the data, as their sample estimates are unreliable (Lombardo et al. Citation2014).

The special case θω=0 corresponds to the symmetric scheme (SMA), in which:

(32) ASωARω=2sdω,aηS=01/22sdωcos2πηωdω=aηS(32)

where we have used the superscript S to denote the symmetric case. Another interesting special case is θω=1/4 signω (or 2πθω=(π/2)signω): This corresponds to an antisymmetric AMA scheme (ANTAMA) with

(33) AAω=ARωδω+iARω,\break aηA=δ0+1201/22sdωsin2πηωdω(33)

where the superscript A denotes the antisymmetric case, δω is the Dirac delta function, and

(34) δ0:=2sd022J+1γJ2J+1(34)

with δ0 approaching zero as J becomes large. Any other case of constant θ0 (where θω=θ0signω) can be expressed in terms of the above two limiting cases through:

(35) aη=δ0+aηSδ0cos2πθ0+aηAδ0sin2πθ0(35)

(the proof is omitted). In particular, the case θ0=1/8 (or 2πθ0=π/4) yields the interesting result:

(36) aη=22aηS+aηA21δ0(36)

If for some θ0 and for η<0 it happens that aη0, then it can be verified that:

(37) aη0,η<0aηScos2πθ0+1cos2πθ0sin2πθ0δ0,η=02aηS+22δ0,η>0(37)

Such a sequence with almost zero coefficients for negative η will be close to the OBAMA scheme. It is interesting to notice that in this approximate solution only a0 depends on the constant θ0, while for η>0 the coefficients are approximately equal to those in the SMA, multiplied by 2.

However, if full accuracy is needed, we must have in mind that a constant θ0 does not give a precise OBAMA. A generic analytical solution that would give a function θω for OBAMA is not simple (this problem is known as factoring of the power spectrum; see Papoulis Citation1991, p. 402). However, solutions for simple special cases are not too difficult to find (e.g. for rational spectra; Papoulis Citation1991, p. 402–404). In particular, for a Markov process in continuous time (which in discrete-time behaves as an ARMA(1,1) process; see Eq. (39)), as well as for any ARMA(1,1) process (including its special cases AR(1) and MA(1)), the following solution is exact (the proof is omitted):

(38) θω=12πcos11sdω1sd0+1sd1/2+sdωsd0+sd1/2(38)

For stochastic structures with LRD, the OBAMA scheme is difficult to achieve with full precision, but an approximation can be easily found as described in Appendix B.

Figure 2. Illustration of the symmetric (SMA), antisymmetric (ANTAMA) and ordinary-backword (OBAMA) cases of the generic AMA model for (upper) a Markov and (lower) an FHK-C process. The parameter values are α = 10, λ = 1 (in both processes), H = 0.8, M = 0.7, and the number of weights is 2049 (J = 1024 = 210). The panels on the left show the coefficients a and those on the right the autocovariance function.

Figure 2. Illustration of the symmetric (SMA), antisymmetric (ANTAMA) and ordinary-backword (OBAMA) cases of the generic AMA model for (upper) a Markov and (lower) an FHK-C process. The parameter values are α = 10, λ = 1 (in both processes), H = 0.8, M = 0.7, and the number of weights is 2049 (J = 1024 = 210). The panels on the left show the coefficients a and those on the right the autocovariance function.

The method is illustrated in using two example processes. The first is the Markov process, whose basic properties are:

(39) ch=λeh/α,γk=2λk/α11ek/αk/α,\breakcη=λ1eD/α2D/α2eη1D/α,η1(39)

The second is the FHK model defined in Eq. (13), which gives its climacogram, whilst all its other properties are evaluated through Eqs. (21)-(22). shows three special cases, SMA (Eq. (32)), ANTAMA (Eq. (33)) and OBAMA for the two processes. For the OBAMA case and the Markov process the exact Eq. (38) is used while for the FHK-C process the coefficients aη are determined from the approximate equation (B2) in Appendix B (with optimized coefficients of the equation). Some slight (rather invisible) deviations from zero are present in the left-bottom panel, which at a later step will be set to zero and the small resulting effect will be further handled as a truncation error (in the manner described by Koutsoyiannis Citation2016) to obtain an exact OBAMA scheme.

All in all, this simple method of the AMA scheme renders ARMA models unnecessary, particularly because of the generic, analytical and fast solution it offers. Here it is important to stress that, while optimization of coefficients involved in the function θω could sometimes be necessary as discussed above and specified in Appendix B, it is not necessary in general. Any odd real function θω, chosen arbitrarily, will give aη that will satisfy Eq. (25) (apart from a truncation error) and thus can directly be used in generation. Even if the sequence of θωj is constructed at random (e.g. as a sequence of random numbers in the interval [0,1/4]), again Eq. (25) will be satisfied and the resulting aη can directly be used in generation. (This case is also implemented on the spreadsheet provided as supplementary information).

6 Additional methodological issues

6.1 The AMA scheme in forecast mode

It is generally believed that AR models are better suited to forecast as they depend on past variables rather than on multiple innovation variables. According to this logic, among the MA models, only backward schemes would be suitable for forecast, while the generic AMA scheme (including the special case of SMA) would not, because it involves convolutions for both positive and negative lags η. Such beliefs are not quite correct though. Arguably, the forecast problem could be regarded as totally independent of the stochastic model per se, while forecasts can be obtained from any stochastic model, not only from a backward one. Had this not been the case, we would be in a deadlock in providing forecasts even with AR models. To see this, we may consider the case of an AR(p) scheme used to approximate a process with LRD. Because of LRD, the value of p will be large, typically larger than the number (n) of known past observations (p>n). Therefore, the mathematical expression of the AR(p) as a weighted sum of past values could not be applied for forecast because some of the required past values would not be known.

Koutsoyiannis (Citation2000) formulated a simple and general solution of the forecast problem, which can be applied for any type of linear model and hence for AMA. This calculates the prediction x_^τ for any future time τ combining, on the one hand, any simulation x_τ independent of the observations, and, on the other hand, the observations z_ˆ, through the equations:

(40) x_^τ=x_τ+cτTC1(z_^z_),var[x_^τ|z_^]=γ1cτTC1cτ(40)

where zˆ_:=xˆ_0,xˆ_1,,xˆ_nT is a vector containing the observations at the present (xˆ_0) and past times (xˆ_1,,xˆ_n), and z_:=[x_0,x_1,,x_n]T is a vector containing the independently simulated items of the stochastic process at the present and past, whilst cτ:=cov[x_τ,z_] and C:=covz_,z_ are a vector and a matrix of covariances, respectively, with items determined from the chosen stochastic model (again, not estimated from the data).

6.2 Time asymmetry

Time irreversibility can very easily be handled within the AMA framework. Assuming that the white noise v_τ (in discrete-time τ) has variance 1 and coefficient of skewness CSv, we will have:

(41) varx_τ=j=JJaj2,μ3x_τ=j=JJaj3CSv,CS:=μ3τvarx_τ3/2=j=JJaj3j=JJaj23/2CSv(41)

where μ3[x_τ] and CS are the third moment and the coefficient of skewness of process x_τ, respectively.

Time asymmetry is quantified through the skewness of the differenced process x˜_τ, according to Eq. (4), which by virtue of (24) is written as:

(42) x˜_τ:=x_τx_τ1=j=JJ(ajaj1)v_τj(42)

with aJ1=0. Thus, its skewness will be:

(43) C˜S:=μ3x˜_τvarx˜_τ3/2=j=JJajaj13j=JJajaj123/2CSv(43)

The ratio:

(44) C˜SCS=j=JJajaj13j=JJajaj123/2j=JJaj23/2j=JJaj3(44)

is independent of CSv and primarily depends on θω, which determines the sequence of aη. The case θω=0, i.e. the SMA, results in complete time symmetry. However, a constant θ00 (appropriately chosen) can make the ratio C˜S/CS as high as we wish, thus enabling the preservation of time asymmetry.

The above results make it clear that without skewness in the original process x_τ (e.g. in the case of Gaussian processes), there cannot be time asymmetry, a result that is consistent with Weiss’s (Citation1975).

7 Application

Here we repeat the application in Koutsoyiannis (Citation2019b) for the streamflow time series but using the new analytical framework. The data are from the database of the US Geological SurveyFootnote6 for the site USGS 01603000 North Branch Potomac River Near Cumberland, Maryland (39°37ʹ18.5”N, 78°46ʹ24.3”W, catchment area: 2271 km2). They cover the period 2013-10-01 to 2018-08-31 for a time step of 15 min, which was aggregated to hourly, while discharge was converted from cubic feet per second to m3/s. Missing values (4% for a total of 172 416 values) were left unfilled. Seasonality is not negligible and was treated by standardizing the values of each month by an appropriate coefficient (see Koutsoyiannis Citation2019b). We note here that this standardization is just a rough modelling approximation, adequate for our aim to investigate time irreversibility, while an approach for operational use would require the construction of a cyclostationary model.

The process x_τ is modelled as an FHK-C process; the fitting is impressively good and the parameters are M = 0.56 (indicating a slightly smooth process), H = 0.6 (a persistent process), α = 160 h and λ = γ(0) = 1.01 (for the standardized process). The fitting is shown in in terms of the climacogram, which is good for visualizing the behaviour at large time scales, and the climacospectrum, which provides a better visualization for small time scales.

Figure 3. Comparison with the FHK-C model of the climacogram and climacospectrum of the original series, after standardization for dealing with seasonality (top); and of the generated series (bottom).

Figure 3. Comparison with the FHK-C model of the climacogram and climacospectrum of the original series, after standardization for dealing with seasonality (top); and of the generated series (bottom).

Additional parameters, quantifying the state and time asymmetry, are the skewness coefficients of the original and differenced process. The analysis in Koutsoyiannis (Citation2019b) suggested prominent time irreversibility at time scales from 1 h to about 4 d. This is quantified by a skewness coefficient of the differenced process, which is much higher than the skewness of the original process x_τ (10.99 for x˜_τ and 2.98 for x_τ at the hourly scale, a ratio C˜S/CS=3.69).

For the application, we use the proposed method with constant θ0 and with J = 1024. We note that the SMA case results in zero skewness of the differenced process (time symmetry) while in the ANTAMA case the skewness tends to infinity, which indicates that the method does not have any upper limit of time irreversibility that it can handle. Choosing θ0=0.0638, we make the ratio of the skewness of x˜_τ and x_τ equal to 3.69, as required. The coefficients aη determined by direct application of the method are shown in , along with the climacogram and the autocorrelation function of the original process.

Figure 4. Climacogram, autocorrelation and coefficients aη determined by the proposed method.

Figure 4. Climacogram, autocorrelation and coefficients aη determined by the proposed method.

Figure 5. Generated discharge time series (top) (time references are arbitrary) and close-up for a six-month period containing the highest generated discharges (bottom). Plots refer to the “naturalized” (back-transformed) series.

Figure 5. Generated discharge time series (top) (time references are arbitrary) and close-up for a six-month period containing the highest generated discharges (bottom). Plots refer to the “naturalized” (back-transformed) series.

Figure 6. Climacograms and skewness coefficients of the original and differenced time series for: the real-world data (top) and the generated data (bottom). Plots refer to the standardized series.

Figure 6. Climacograms and skewness coefficients of the original and differenced time series for: the real-world data (top) and the generated data (bottom). Plots refer to the standardized series.

For the generation of vi we use lognormal white noise with skewness determined so that the skewness of the original process be 2.98 as required. A time series with length equal to that of the real-world data was derived for the process x_τ using Eq. (24) and was then “naturalized” by applying the inverse seasonal transformation. Plots for the generated time series are shown in , where in the close-up (lower panel) the irreversibility is evident, with steeper rising branches and milder falling ones. Comparisons of the generated time series with the model, as well as with the real-world data, in terms of climacogram and climacospectrum, are shown in . Comparisons of the characteristics related to state and time asymmetry are shown in . It is observed that the generated time series is consistent with the model and the original data in terms of all important statistics, marginal and joint, as well as related to time irreversibility. One slight discrepancy is that the curve related to time asymmetry in , while it captures the skewness of the real-world differenced data for the hourly scale, has a steeper slope for intermediate time scales; however, controlling this slope was not in the scope of this study. All in all, the method seems successful in simulation of streamflow time series even at fine time scales.

8 Conclusions

The main contribution of this article is the introduction of an equation that determines analytically the weights aη of an asymmetric moving average (AMA) scheme, which can be used in any problem of stochastic simulation of time-irreversible processes. This equation is rewritten here as:

(45) aη=1/21/22e2πiθωηωsdωdω(45)

where sdω is the power spectrum of the discrete-time representation of the process of interest and θω is any arbitrary odd real function. The same equation can also be used for time-reversible processes, even though in this case it suffices to use the simpler symmetric (SMA) scheme, which can be obtained as a special case of the AMA for θω=0. In this case, the equation simplifies to:

(46) aηS=01/22cos2πηωsdωdω(46)

A simple expression can also be obtained for any case where θω is a constant θ0 multiplied by the sign of ω. Assuming that an adequately large number of weights is used (so that the quantity δ0 be negligible) this expression becomes:

(47) aη=aηScos2πθ0+aηAsin2πθ0(47)

where

(48) aηA=1201/22sin2πηωsdωdω(48)

is the antisymmetric solution (ANTAMA) of the equation.

The numerical evaluation of these weights is very easy and fast, as it only needs application of FFT, while the generation of the processes of interest is even easier; both are illustrated in a spreadsheet provided as supplementary information. The applicability of the method has no restrictions and no iterations are needed. Once the process dependence is known through any second-order tool (climacogram, autocovariance, power spectrum, etc.) and the properties related to time reversibility or irreversibility are specified, the method can be readily applied for stochastic generation.

The extended historical review presented in this article reveals that the proposed method is against the current of simulation approaches that were developed since the 1950s. Specifically, the approach proposed (a) uses a purely moving-average scheme, without any autoregressive term, (b) involves innovation terms and weights that extend to both directions of time, and (c) does not link the generation scheme with the observations – rather, the observations are used in the model fitting phase, which is independent of the simulation. Because of the moving-average scheme we are able to easily handle and preserve moments higher than second. Because of the involvement of terms in both directions of time in an asymmetric mode, we can easily reproduce time irreversibility in a controlled manner. And because of the separation of model fitting from the calculation of the weights of the linear model, we are able to handle any type of dependence in time for whatever range (short or long) and fractality (smooth or rough). For the same reason, we are able to make the model as parsimonious as we wish (and as the data allow) without being trapped to involve many parameters in order to preserve, for example, autocovariances of large lags.

The entire framework is so simple and powerful that one can hardly believe that it is presented for the first time. Perhaps it is a reinvention of something already known. Whether it has been invented or not in the past, the fact is that it is not well known (and certainly not known to the author); otherwise, the ARMA models and their variants would not be so popular. Hopefully, this article could contribute to give ARMA-type models a position in the history of stochastics.

Supplemental material

Supplemental Material

Download (3.8 MB)

Acknowledgements

I thank the eponymous reviewers William Farmer and Alberto Montanari for their extraordinarily constructive comments which encouraged me to broaden, completely restructure and, eventually, substantially improve the initial manuscript. During the preparation of the second version of the manuscript, I had the pleasure to present its content in a seminar in the Sapienza University of Rome as well as in the First Workshop on Control Methods for Water Resource Systems in Delft. I wish to express my gratitude to Francesco Napolitano for inviting and organizing my visit in Rome during September 2019, and Alla Kolechkina and Ronald van Nooyen for the invitation in the Workshop in Delft.

Disclosure statement

No potential conflict of interest was reported by the author.

Supplementary material

Supplemental data for this article can be accessed here.

Additional information

Funding

No funding was provided for this research.

Notes

1 Boltzmann coined the terms ergode and isodic, both of which are etymologized from Greek words but which ones exactly is uncertain. Most probably, ergodic comes from the Greek ἔργον (ergon = work) and ὁδός (hodos = pathway). According to another interpretation, the second noun is είδος (eidos = form, kind, nature), or the whole word is a transliteration of the Greek adjective ἐργώδης (ergodes = laborious, troublesome; see Mathieu Citation1988).

2 The Greek adjective “stochastic[os/e]” was used by Greek philosophers, including Plato and Aristotle, and was transplanted to the international scientific vocabulary by Jacob Bernoulli (evidently aware of the Greek literature and in particular of Plato’s texts) in his famous book Ars Conjectandi (written in Latin in 1684–89 but published after his death, in 1713). The term was revived by Bortkiewicz (Citation1917; Russian economist and statistician of Polish ancestry) and also by Slutsky (Citation1925, Citation1928a, Citation1928b, Citation1929; Ukrainian/Russian/Soviet mathematical statistician and economist). It appears that the prevalence in USSR of the more sophisticated term “stochastic” (over the rather equivalent term “random”) must have been related to political and ideological reasons (incongruence of randomness with the dialectic materialism: models beyond strict deterministic were considered with a priori suspicion; see Mazliak Citation2018).

3 LeClerc’s Monte Carlo method to calculate π became popular among scientists and his experiment was later repeated by many. There are many other Monte Carlo algorithms to estimate π. However, these are good only for fun, as much faster and much more accurate deterministic algorithms exist to calculate π. Reitwiesner (Citation1950) calculated by a deterministic algorithm, running on the ENIAC computer, the first 2035 decimal digits of π. Metropolis et al. (Citation1950) examined their randomness, an exercise made thereafter many times showing that the digits of π have no apparent pattern and pass tests for statistical randomness. Dodge (Citation1996) promoted an idea opposite to LeClerc’s: that the digits of π form a “Natural Random Number Generator”. Since January 2019, 31.4 trillion digits of π are known (found by the Chudnovsky algorithm); this information, equivalent to ~100 million books of 1000 pages each (note for comparison that the British Library has 25 million books), can serve as a basis for any simulation experiment. However, simple random generators are more economic and convenient.

4 Notably, Ulam conceived the method while playing solitaires during convalescing from an illness, in his attempt to estimate the probabilities of success of the plays. As Ulam describes the story in some remarks later published by Eckhardt (Citation1989), “After spending a lot of time to estimate them by pure combinatorial calculations, I wondered whether a more practical method than ‘abstract thinking’ might not be to lay it out say one hundred times and simply observe and count the number of successful plays.”

5 The term climacogram, from the Greek κλιμακόγραμμα, deriving from κλίμαξ (climax = scale, as well as ladder; pl. κλίμακες) and γράμμα (gramma = written, drawn), was coined by Koutsoyiannis (Citation2010) and could be translated in English as scale(o)gram, but the latter term is used for another concept. Climacogram should not be confused to climatogram which has another meaning related to the climatic regime of temperature and precipitation at a site or area. The latter term, deriving from the noun κλίμα (clima, originally meaning slope; pl. κλίματα), was first used in the Hellenistic period by the astronomer Hipparchus to describe climate (in relationship to the slope of the sun’s rays) and is different from the other derivative noun κλίμαξ. Interestingly though, both κλίμαξ and κλίμα are eventually etymologized from the same verb κλίνειν (klinein = to slope).

6 https://nwis.waterdata.usgs.gov/md/nwis/uv/?site_no = 01603000&agency_cd = USGS, retrieved: 2018-09-16.

References

  • Bailey, A.L., 1929. A summary of advanced statistical methods. US Statistical Bureau. https://babel.hathitrust.org/cgi/pt?id=mdp.39015067991326&view=1up&seq=7.
  • Barnes, F.B., 1954. Storage required for a city water supply. The Institution of Engineers, Australia, 26, 198–203.
  • Barnett, V., 2006. Chancing an interpretation: Slutsky’s random cycles revisited. The European Journal of the History of Economic Thought, 13 (3), 411–432. doi:10.1080/09672560600875596
  • Beard, L.R., 1965. Use of interrelated records to simulate streamflow. Proceedings ASCE, Journal of the Hydraulics Division, 91 (HY5), 13–22.
  • Birkhoff, G.D., 1931. Proof of the ergodic theorem. Proceedings of the National Academy of Sciences, 17, 656–660. doi:10.1073/pnas.17.2.656
  • Boltzmann, L., 1877. Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respektive den Sätzen über das Wärmegleichgewicht. Wiener Ber, 76. 373–435. (in German).
  • Boltzmann, L., 1884/85. Über die Eigenschaften monocyklischer und anderer damit verwandter Systeme. Crelles Journal, 98, 68–94.
  • Bortkiewicz, L., 1917. Die iterationen — Ein beitrag zur wahrscheinlichkeitstheorie. Berlin: Springer.
  • Box, G.E. and Jenkins, G.M., 1970. Time series models for forecasting and control. San Francisco, USA: Holden Day.
  • Bracewell, R.N., 2000. The fourier transform and its applications. 3rd Ed. Boston, MA: McGraw-Hill.
  • Chow, V.T., 1969. Stochastic analysis of hydrologic systems. Final Report. Urbana, IL: University of Illinois. https://www.ideals.illinois.edu/bitstream/handle/2142/90345/Chow_1969.pdf?sequence=2.
  • Chow, V.T. and Kareliotis, S.J., 1970. Analysis of stochastic hydrologic systems. Water Resources Research, 6 (6), 1569–1582. doi:10.1029/WR006i006p01569
  • Dimitriadis, P. and Koutsoyiannis, D., 2015. Climacogram versus autocovariance and power spectrum in stochastic modelling for Markovian and Hurst–Kolmogorov processes. Stochastic Environmental Research & Risk Assessment, 29 (6), 1649–1669. doi:10.1007/s00477-015-1023-7
  • Dimitriadis, P. and Koutsoyiannis, D., 2018. Stochastic synthesis approximating any process dependence and distribution. Stochastic Environmental Research & Risk Assessment, 32 (6), 1493–1515. doi:10.1007/s00477-018-1540-2
  • Dodge, Y., 1996. A natural random number generator. International Statistical Review, 64 (3), 329–344. doi:10.2307/1403789
  • Eckhardt, R., 1989. Stan Ulam, John von Neumann and the Monte Carlo method. In: N.G. Cooper, ed. From cardinals to chaos. NY: Cambridge University, 131–136.
  • Galton, F., 1890. Dice for statistical experiments. Nature, 42, 13–14. doi:10.1038/042013a0
  • Gauch, H.G., Jr., 2003. Scientific method in practice. Cambridge, UK: Cambridge University Press.
  • Hamed, K.H., 2008. Trend detection in hydrologic data: the Mann–Kendall trend test under the scaling hypothesis. Journal of Hydrology, 349 (3–4), 350–363. doi:10.1016/j.jhydrol.2007.11.009
  • Hazen, A., 1914. Storage to be provided in impounding reservoirs for municipal water supply. Transactions of the American Society of Civil Engineers, 77, 1539–1640.
  • Hosking, J.R.M., 1981. Fractional differencing. Biometrika, 68 (1), 165–176. doi:10.1093/biomet/68.1.165
  • Hurst, H.E., 1951. Long term storage capacities of reservoirs. Transactions of the American Society of Civil Engineers, 116, 776–808.
  • Kendall, M.G., 1938. A new measure of rank correlation. Biometrika, 30 (1/2), 81–93. doi:10.1093/biomet/30.1-2.81
  • Khintchine, A., 1933. Zu Birkhoffs Lösung des Ergodenproblems. Mathematische Annalen, 107, 485–488. doi:10.1007/BF01448905
  • Khintchine, A., 1934. Korrelationstheorie der stationären stochastischen Prozesse. Mathematische Annalen, 109 (1), 604–615. doi:10.1007/BF01449156
  • Klemes, V., 1987. One hundred years of applied storage reservoir theory. Water Resources Management, 1 (3), 159–175. doi:10.1007/BF00429941
  • Kolmogorov, A.N., 1931. Uber die analytischen Methoden in der Wahrscheinlichkcitsrechnung. Mathematische Annalen, 104, 415–458. (English edition: Kolmogorov, A.N., 1992. On analytical methods in probability theory. Selected Works of A. N. Kolmogorov - Volume 2, Probability Theory and Mathematical Statistics, A. N. Shiryayev, ed., Kluwer, Dordrecht, The Netherlands, pp. 62-108). doi:10.1007/BF01457949
  • Kolmogorov, A.N., 1933. Grundbegriffe der Wahrscheinlichkeitsrechnung (Ergebnisse der Mathematik und Ihrer Grenzgebiete). Berlin: Springer. (2nd English Edition: Foundations of the Theory of Probability, 84 pp. Chelsea Publishing Company, New York, 1956).
  • Kolmogorov, A.N., 1937. Ein vereinfachter Beweis der Birkhoff-Khintchineschen Ergodensatzes. Matematicheskii Sbornik, 2, 367–368.
  • Kolmogorov, A.N., 1938. A simplified proof of the Birkhoff-Khinchin ergodic theorem. Uspekhi Matematicheskikh Nauk, 5. 52–56. (English edition: Kolmogorov, A.N., 1991, Selected Works of A. N. Kolmogorov - Volume 1, Mathematics and Mechanics, Tikhomirov, V. M. ed., Kluwer, Dordrecht, The Netherlands, pp. 271-276).
  • Kolmogorov, A.N., 1940. Wienersche Spiralen und einige andere interessante Kurven im Hilbertschen Raum. Doklady Akademii Nauk SSSR, 26. 115–118. (English edition: Kolmogorov, A.N., 1991. Wiener spirals and some other interesting curves in a Hilbert space. Selected Works of A. N. Kolmogorov - Volume 1, Mathematics and Mechanics, Tikhomirov, V. M. ed., Kluwer, Dordrecht, The Netherlands, pp. 303-307).
  • Kolmogorov, A.N., 1947. Statistical theory of oscillations with continuous spectrum. In: Collected papers on the 30th anniversary of the Great October Socialist Revolution. Vol. 1. Moscow-Leningrad: Akad. Nauk SSSR, 242–252. (English edition: Kolmogorov, A.N., 1992. Selected Works of A. N. Kolmogorov - Volume 2, Probability Theory and Mathematical Statistics, A. N. Shiryayev ed., Kluwer, Dordrecht, The Netherlands, pp. 321-330).
  • Koutsoyiannis, D., 2000. A generalized mathematical framework for stochastic simulation and forecast of hydrologic time series. Water Resources Research, 36 (6), 1519–1533. doi:10.1029/2000WR900044
  • Koutsoyiannis, D., 2010. A random walk on water. Hydrology and Earth System Sciences, 14, 585–601. doi:10.5194/hess-14-585-2010
  • Koutsoyiannis, D., 2016. Generic and parsimonious stochastic modelling for hydrology and beyond. Hydrological Sciences Journal, 61 (2), 225–244. doi:10.1080/02626667.2015.1016950
  • Koutsoyiannis, D., 2017. Entropy production in stochastics. Entropy, 19 (11), 581. doi:10.3390/e19110581
  • Koutsoyiannis, D., 2019a. Knowable moments for high-order stochastic characterization and modelling of hydrological processes. Hydrological Sciences Journal, 64 (1), 19–33. doi:10.1080/02626667.2018.1556794
  • Koutsoyiannis, D., 2019b. Time’s arrow in stochastic characterization and simulation of atmospheric and hydrological processes. Hydrological Sciences Journal, 64 (9), 1013–1037. doi:10.1080/02626667.2019.1600700
  • Koutsoyiannis, D. and Montanari, A., 2015. Negligent killing of scientific concepts: the stationarity case. Hydrological Sciences Journal, 60 (7–8), 1174–1183. doi:10.1080/02626667.2014.959959
  • Kyun, K. and Kim, K., 2006. Equilibrium business cycle theory in historical perspective. International Statistical Review.
  • Lawrance, A.J., 1991. Directionality and reversibility in time series. International Statistical Review, 59 (1), 67–79. doi:10.2307/1403575
  • Lombardo, F., et al., 2014. Just two moments! A cautionary note against use of high-order moments in multifractal models in hydrology. Hydrology and Earth System Sciences, 18, 243–255. doi:10.5194/hess-18-243-2014
  • Mackey, M.C., 2003. Time’s arrow: the origins of thermodynamic behavior. Mineola, NY: Dover, 175.
  • Mahalanobis, P.C., 1934. Tables of random samples from a normal population. Sankya, 1, 289–328.
  • Mandelbrot, B.B. and Wallis, J.R., 1969. Computer experiments with fractional gaussian noises: part 1, averages and variances. Water Resources Research, 5, 228–241. doi:10.1029/WR005i001p00228
  • Mann, H.B., 1945. Nonparametric tests against trend. Econometrica, 13 (3), 245–259. doi:10.2307/1907187
  • Matalas, N.C., 1967. Mathematical assessment of synthetic hydrology. Water Resources Research, 3 (4), 937–945. doi:10.1029/WR003i004p00937
  • Mathieu, M., 1988. On the origin of the notion ‘Ergodic Theory’. Expositiones Mathematicae, 6, 373–377.
  • Mazliak, L., 2018. The beginnings of the Soviet encyclopedia. The utopia and misery of mathematics in the political turmoil of the 1920s. Centaurus, 60 (1–2), 25–51. doi:10.1111/cnt.2018.60.issue-1-2
  • Metropolis, N., 1989. The beginning of the Monte Carlo method. In: N.G. Cooper, ed.. From cardinals to chaos. NY: Cambridge University, 125–130.
  • Metropolis, N., Reitwiesner, G., and von Neumann, J., 1950. Statistical treatment of values of first 2,000 decimal digits of e and of π calculated on the ENIAC. Mathematical Tables and Other Aids to Computation, 4, 109–112. doi:10.2307/2002776
  • Metropolis, N. and Ulam, S.M., 1949. The Monte Carlo method. Journal of the American Statistical Association, 44, 335–341. doi:10.1080/01621459.1949.10483310
  • Montanari, A. and Koutsoyiannis, D., 2014. Modeling and mitigating natural hazards: stationarity is immortal! Water Resources Research, 50 (12), 9748–9756. doi:10.1002/2014WR016092
  • Niederreiter, Η., 1992. Random number generation and quasi-Monte Carlo methods. Philadelphia: Society for Industrial and Applied Mathematics.
  • O’Connell, P.E., et al., 2016. The scientific legacy of Harold Edwin Hurst (1880 – 1978). Hydrological Sciences Journal, 61 (9), 1571–1590. doi:10.1080/02626667.2015.1125998
  • Papoulis, A., 1991. Probability, Random Variables and Stochastic Processes. 3rd Ed. New York: McGraw-Hill.
  • Persons, W.M. 1919. Measuring and forecasting general business conditions. Harvard University & American Institute of Finance. https://babel.hathitrust.org/cgi/pt?id=hvd.32044018749697
  • Reitwiesner, G., 1950. An ENIAC Determination of π and e to more than 2000 Decimal Places. Mathematical Tables and Other Aids to Computation, 4, 11–15. doi:10.2307/2002695
  • Slutsky, E., 1927. Slozhenie sluchainykh prichin, kak istochnik tsiklicheskikh protsessov. Voprosy Kon”yunktury, 3. 34–64. [English edition: Slutsky, E., 1937. The summation of random causes as the source of cyclic processes. Econometrica: Journal of the Econometric Society, 105-146].
  • Slutsky, E., 1928a. Sur ùn critérium de la convergence stochastique des ensembles des valeurs éventuelles). Comptes Rendus De l’Académie Des Sciences, 187, 370.
  • Slutsky, E., 1925. Über stochastische Asymptoten und Grenzwerte. Metron, 5 (3), 3–89.
  • Slutsky, E., 1928b. Sur les fonctions éventuelles continues, intégrables et dérivables dans le sens stochastiques. Comptes Rendus Des Séances De l’Académie Des Sciences, 187, 878.
  • Slutsky, E., 1929. Quelques propositions sur les limites stochastiques éventuelles. Comptes Rendus Des Séances De l’Académie Des Sciences, 189, 384.
  • Stigler, S.M., 2002. Statistics on the table: the history of statistical concepts and methods. Cambridge, MA: Harvard University Press.
  • Sudler, C., 1927. Storage required for the regulation of streamflow. Transactions of the American Society of Civil Engineers, 91, 622–660.
  • Thomas, H.A. and Fiering, M.B., 1962. Mathematical synthesis of streamflow sequences for the analysis of river basins by simulation. In: A. Maass, et al., eds.. Design of water resource systems. Cambridge, MA: Harvard University Press, 459–493.
  • Tippett, L.H.C., 1927. Random sampling numbers. London: Cambridge University Press, Tracts for Computers No. XV.
  • Tong, H. and Zhang, Z., 2005. On time-reversibility of multivariate linear processes. Statistica Sinica, 15 (2), 495–504.
  • Walker, G., 1931. On periodicity in series of related terms. Proceedings of the Royal Society of London, Series A, 131, 518–532. doi:10.1098/rspa.1931.0069
  • Weiss, G., 1975. Time-reversibility of linear stochastic processes. Journal of Applied Probability, 12 (4), 831–836. doi:10.2307/3212735
  • Whittle, P., 1951. Hypothesis testing in times series analysis. PhD thesis, Almqvist & Wiksells, Uppsala.
  • Whittle, P., 1952. Tests of fit in time series. Biometrika, 39 (3/4), 309–318. doi:10.1093/biomet/39.3-4.309
  • Whittle, P., 1953. The analysis of multiple stationary time series. Journal of the Royal Statistical Society B, 15 (1), 125–139.
  • Wold, H.O., 1938. A study in the analysis of stationary time-series. PhD thesis, Almquist and Wicksell, Uppsala.
  • Wold, H.O., 1948. On prediction in stationary time series. The Annals of Mathematical Statistics, 19 (4), 558–567. doi:10.1214/aoms/1177730151
  • Yule, G.U., 1927. On a method of investigating periodicities in disturbed series, with special reference to Wolfer’s sunspot numbers. Philosophical Transactions of the Royal Society of London A, 226, 267–298. doi:10.1098/rsta.1927.0007

Appendix A

Proof of the basic properties of the AMA solution

To show point (a) below Eq. (27), we observe that the sequence of aη is the finite Fourier transform of:

(A1) Aω:=e2πiθωARω=cos2πθω+isin2πθωARω(A1)

We recall that the power spectrum sω is an even function, and so are sdω and ARω. Besides cos2πθω is also even, while sin2πθω is odd, given that θω is odd. Consequently, Aω has even real part and odd imaginary part. Hence, its Fourier transform is real (Bracewell Citation2000, p. 13), which proves that aη is real. It can also be written as a real expression:

(A2) aη=201/2cos2πθωηωARωdω(A2)

To show point (b), we observe that Aω is the inverse finite Fourier transform of aη, i.e.

Aω=2l=ale2πilω

Besides, the inverse finite Fourier transform of the quantity cη:=l=alaη+l is

(A3) Bω=2η =cηe2πiηω=2η =l =alaη+le2πiηω=2l =alη =aη+le2πiηω=2l =alj =aje2πijlω=2l =ale2πilωj =aje2πijω=AωAω2(A3)

where Aω is the conjugate of Aω, equal to e2πiθωARω. Consequently,

(A4) Bω=AωAω2=e2πiθωARωe2πiθωARω2=ARω22=sdω(A4)

where we have utilized the definition of ARω in (27). Hence, the Fourier transform of cη equals the power spectrum, which means that cη is identical to the autocovariance function cη.

Now, coming to point (c), the calculation of the sequence of coefficients aη, and hence the generation of the required time series, we first note that, unlike in the above theoretical derivations, summation is in practice made for a finite number of terms, i.e. for η=J,,0,,J, where J if finite. The coefficients aη will decrease with increasing η and will be negligible beyond an appropriately chosen, adequately large, J. Possible truncation errors can be handled as described by Koutsoyiannis (Citation2016 – even though that solution was for the SMA scheme). For the numerical part, since we know the frequency functions ARω (from the power spectrum) and θω (appropriately or even arbitrarily or randomly chosen), for an array of frequencies ωj, we are able to write Eqs. (30)–(31) and complete the algorithm described in steps (i)-(iv) in Sec. 5.

Appendix B

The OBAMA scheme and other cases requiring optimization

If the process exhibits LRD and for some reason an OBAMA scheme is desired, one option would be to generalize Eq. (38) to the form:

(B1) θω=θ0+12πcos11C1sdω+sdωC2C0,ω>0(B1)

where θ0,C0,C1,C2,are coefficients to be optimized numerically, so that the deviation from zero of all η<0 (quantified e.g. by the mean square error) be minimal. Continuity at 0 of θω is not necessary, but if required it can be achieved setting θ0=0.

A second option, which was found quite satisfactory in numerical explorations, is to define θω as the smooth minimum of two rational functions of frequency, θiω,i=1,2, with first degree polynomials, i.e.:

(B2) θω=1ζlneζθ1ω+eζθ21/2ω, θiω:=C1,iωC2,i+ω+C0,i(B2)

where Cji;i=1,2;j=0,1,2 are coefficients with values to be optimized and ζ a big negative number, say, ζ=10. Optimizing each θωj, without using parametric relationships, is another option but in the LRD case the frequencies of interest ωj will be numerous and, therefore, an expression such as (B1) or (B2), whose coefficients are optimized, may be an advantage.

We can use the same expressions for other tasks. For example, if for some reason we need an ordinary-backward scheme of ARMA type, again the framework provided can directly convert itself into an ARMA scheme. Thus, to convert AMA to an ARMA1,q model, for which aη=0 for η<0 and aη=βaη1=βηqaq for η>q and for some appropriate β, we can form the objective function (representing the cumulative departure of the achieved sequence of coefficients from those of an ARMA1,q model):

(B3) ea0,  ,aq; β:=η=J0aη2+η=qJaηβηqaq2(B3)

minimize it finding the vector of coefficients a0,  ,aq; β, and reformulate the generating Eq. (24) as:

(B4) x_τ=βx_τ1+η=0qaηv_τη(B4)

so that a limited number of previous noise variables v_τη and the latest process variable x_τ1 are only needed. This case is also implemented in the spreadsheet provided as supplementary information. The same spreadsheet file contains also other cases, such as the conversion into an AR(q) model or to the sum of three AR(1) models, where the algorithms used are hopefully self-explanatory.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.