Abstract
We discuss in this paper optimality properties of identification algorithms in a set membership framework. We deal with restricted-complexity (conditional) identification, where approximations (models) to a possibly complex system are selected from a low dimensional space. We discuss the worst- and average-case settings. In the worst-case setting, we present results on optimality, or suboptimality, of algorithms based on computing the unconditional or conditional Chebyshev centres of an uncertainty set. In the average-case setting, we show that the optimal algorithm is given by the projection of the unconditional Chebyshev centre. We show explicit formulas for its average errors, allowing us to see the contribution of all problem parameters to the minimal error. We discuss the case of weighted average errors corresponding to non-uniform distributions over uncertainty sets, and show how the weights influence the minimal identification error.
1. Introduction
The idea behind the set membership approach is to use a priori knowledge about the problem being solved, and the measured (or computed, or observed) information about unknown quantities to define a feasible set of elements consistent with the available information. A number of results is devoted to the characterization of feasible sets in system identification, see [Citation1 – Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation13] and the references therein. In many cases, such as, e.g., in restricted-complexity estimation, the desired estimate is often required to be a member of a set of given simple structure. In this case estimation is referred to as conditional [Citation6].
In this paper, we present results obtained within a set membership framework in two settings: the worst-case and average-case settings. In the worst-case setting, the error of an algorithm is measured by the worst performance for problems consistent with given information, see e.g., [Citation1 – Citation Citation Citation Citation Citation6], [Citation9 – Citation Citation11]. A conservative point of view is thus chosen in the worst-case analysis: exceptionally difficult elements, even if they appear rarely, are taken into account. The optimal algorithm in this setting is defined by the conditional Chebyshev centre of the feasible set. The conditional Chebyshev centre is in general difficult to compute, even for sets of a simple structure [Citation3]. For this reason, we turn our attention to suboptimal, but easy-to-compute estimators. To this end, we discuss, based on [Citation2], the reliability level of the projection algorithm.
Compared to the worst-case approach, the average-case setting is less conservative. The error of an algorithm is now defined on the average, which excludes outliers [Citation7]. The optimal algorithm turns out to be the projection algorithm whose suboptimality has been established in the worst-case setting. We provide formulas for its two local average errors, which explicitly exhibit the contribution of the factors such as information, measurement errors, and restrictions imposed on the models.
In addition to the uniform distribution over uncertainty sets considered in [Citation7], we discuss, based on [Citation8], weighted average errors for a class of weight functions. The results are given explaining the contribution of the weights to the minimal errors. The contribution is through certain integral parameters that can be computed, or estimated, for a given selection of the weights. An example is given illustrating what the estimators and the errors look like for a particular identification problem.
2. Problem formulation
We wish to identify a system x belonging to the system space X = R n . We look for a model (an approximation) to x, having at our disposal information about x given by a vector of measurements y which belongs to the information space Y = R m , with m ⩾ n. The spaces X and Y are equipped with norms ||x|| X = ||x|| and ||y|| Y = ||Ay||, where A is a non-singular m × m matrix, and || || denotes the Euclidean norm in X or Y, with
For example, the information vector may be obtained as follows. For the input sequence u = [u 0,u 1, …, um – 1] T with u 0 ≠ 0, the components of y can be given as output measurements related to u by
Based on y, we compute an approximation to x using an algorithm, by which we mean any mapping
3. Worst-case setting
In this setting, motivated by the aim of constructing robust algorithms, the behaviour of an approximant is studied for the worst system compatible with computed information. For a given measurement y, the error of an algorithm φ, the y-local worst-case error, is defined by
If the estimate selection is unconstrained, the best choice of z = φ(y) is given by the Chebyshev centre of the set ℰγ. (This remains true for an arbitrary set ℰγ, not necessarily given by Equation(6), and any norm in X.) Indeed, the Chebyshev centre of ℰγ is defined as an element c(ℰγ) that minimizes the maximal distance from elements of ℰγ among all z in X,
The algorithm defined by φ(y) = c(ℰγ) is optimal, and its worst-case error is equal to the geometric radius of ℰγ,
In many cases of interest, the Chebyshev centre is readily available. For instance, the uncertainty set E γ in Equation(6) is an ellipsoid in R n with the centre (Chebyshev centre) given by
Computational aspects of the optimality problem are more complicated for conditional identification, even for simple uncertainty sets. The optimal algorithm, denoted by φopt, is now defined by the conditional Chebyshev centre, a minimizer selected among admissible models in the set ℳ:
A simple idea to overcome the difficulties with computing the conditional Chebyshev centre is to take the optimal unconditional estimate given by the unconditional Chebyshev centre c(ℰγ) (which is easy to compute for many sets ℰγ), and to find its projection onto the manifold ℳ. The algorithm obtained in this way is denoted by φproj. It is quite obvious that this algorithm is in general not optimal, and the question is: how much do we lose with respect to the optimal one? Is the projection algorithm suboptimal within a reasonable margin? The answer to these questions is given in the following result [Citation2], which holds for much more general uncertainty sets than ellipsoids. We only assume that:
In [Citation2], the following theorem has been shown.
Theorem 1 If Equation (10) and Equation (11) hold, then
The theorem ensures that the projection algorithm, although not optimal, is suboptimal within a reasonable constant of
4. Average-case setting
The worst-case analysis is necessary when robust approximations are of interest. If, however, robustness is not our main concern, the worst-case error criterion may be viewed as conservative. In [Citation7] a less conservative error measure has been analysed, which reflects the average performance of identification algorithms.
We remain in the set membership framework, and assume, similarly to the worst-case setting, that systems consistent with a measurement y belong to the set ℰγ given in Equation(6). The basic quantity is now the y-local average error of an algorithm defined by
In addition to Equation(13), the other error measure has also been considered in [Citation7]. It is sometimes desirable to assess the behaviour of an algorithm for a fixed system x (not for a fixed measurement y). Based on the errors
The following result provides the formula for the y-local error of an arbitrary algorithm φ, see [Citation7]. Let Kk = {t∈R k : || t || ⩽ 1} be the unit ball in the standard Euclidean norm, let
Theorem 2 For any ℳ ⊂ R n and φ,
Hence, the optimal algorithm φopt is defined by
Theorem 3
For a linear manifold ℳ in Equation(11), the x-local error of the optimal algorithm is given in the next result, see [Citation7]. Let M = [z 1, … , z p be the orthogonal matrix with column vectors that span the space ℳ.
Theorem 4 Let ℳ be given by Equation (11) . Then
This formula explicitly shows the influence on the minimal error of the factors such as the number of measurements m, the information matrix U, the accuracy of measuring information ε, the matrix M representing the subspace ℳ, and the unmodelled dynamics term dist(x,ℳ). For instance, we can now formulate, in algebraic language, criteria for the optimal selection of information matrix or the subspace ℳ to minimize the error of the optimal algorithm, see [Citation7].
5. Average-case setting with weights
The average errors considered in the previous section are defined under the assumption that the distribution of elements in the feasible sets ℰy and ℰ x is uniform. This is not a bad assumption when no additional information concerning the distribution is available. In many cases, however, such knowledge is available, and other distributions of measurements and problem elements, such as, for instance, Gaussian distribution, may better describe the problem. The question arises: how far do the results on optimal estimation given in the previous section rely on the assumption of a uniform distribution? The results given in this section, based on [Citation8], provide the answer for a wide class of distributions.
Let us generalize the notion of the average error, and define the weighted average y-local error. Assume for a fixed y that a non-negative Lebesgue integrable function py : ℰy→R is given such that
To define the weighted x-local error, we assume that a non-negative Lebesgue integrable function q x : ℰ x →R is given such that
Similarly to py , we define the function q x by
The choice of weight functions is not an easy task. Selecting the weights provides in fact additional a priori information about the problem, which should reflect our knowledge about it. A proper weight selection allows us to include relevant a priori information into the problem formulation. For a discussion of the problem of measure selection in a general average setting one is referred to [Citation12], p. 68.
We shall see that in many interesting cases the optimal algorithm is independent of the weights. The error of the optimal algorithm is weight-dependent, and the dependence will be explicitly shown in terms of certain parameters that are easy to compute for any selection of the weights.
We now restrict ourselves to isotropic weight functions; for a discussion of the more general case one is referred to [Citation8]. We assume that basic weight functions
Let α n and
Theorem 5 For weight functions given by Equation (23) , we have
The errors of the optimal algorithm depend on the weights through the parameters α n and
The case of uniform distributions over ℰy and ℰ x corresponds to
Note that for isotropic weight functions the error depends on the matrix
Following [Citation8], we illustrate the results given above for an exemplary problem.
Example Let A = Im , where Im is the identity m × m matrix, i.e., we consider the standard Euclidean norm in the space Y. Let information be given by k repeated measurements of all components of the system x. That is, the information matrix is given by U = (In , … In ) T , with In appearing k times, and exact information is the m × 1 vector Ux = [x T , … x T ] T , with x appearing k times. The total number of measurements m equals m = k · n.
Assume that we are able to measure components of x in such a way that the result obtained, an m × 1 vector y = [(y 1) T , … (y k ) T ] T with k vector components yi , has mean squared error at most
For further illustration, we specify the example above for a simple problem with exemplary data. We wish to identify a system x depending on three parameters,
Let us finally note that the Equationformula (32) suggests an optimization problem concerning the experiment design. For a given positive δ, we wish to reduce the second term in Equation(32) to be of order δ2 (neglecting the influence of h(n,k)). In order to achieve this, we can perform, for any k, k (vector) measurements with accuracy
6. Conclusions
We have considered the performance of conditional estimators in set membership identification in the worst-case and average-case settings. Optimal estimators in the worst-case setting are in general difficult to compute, and suboptimal estimators such as the projection estimator are often considered. We have shown that the error of the projection algorithm never exceeds
Acknowledgment
This research was partly supported by local AGH grant No. 10.420.03.
References
References
- Chernousko FL 1994 State Estimation for Dynamic Systems Boca Raton CRC Press
- Garulli , A , Kacewicz , B , Vicino , A and Zappa , G . 1999 . Reliability of Projection Algorithms in Conditional Estimation . Journal of Optimization Theory and Applications , 101 : 1 – 14 .
- Garulli A Vicino A Zappa G 1997 Conditional Central Algorithms for Worst-Case Estimation and Filtering In Proc. 36th IEEE Conference on Decision and Control San Diego
- Helmicki , AJ , Jacobson , CA and Nett , CN . 1991 . Control Oriented System Identification: A Worst-Case/Deterministic Approach in Hw . IEEE Transactions on Automatic Control , 36 : 1163 – 1176 .
- Jacobson , CA , Nett , CN and Partington , JR . 1992 . Worst Case System Identification in l 1: Optimal Algorithms and Error Bounds . Systems and Control Letters , 19 : 419 – 424 .
- Kacewicz , B , Milanese , M and Vicino , A . 1988 . Conditionally Optimal Algorithms and Estimation of Reduced Order Models . Journal of Complexity , 4 : 73 – 85 .
- Kacewicz , B . 2000 . Optimal Average Case Estimation in Hilbert Norms . Mathematics of Control, Signals and Systems , 13 : 347 – 359 .
- Kacewicz , B . 2003 . Weighted Average Errors in Set-Membership Estimation . Mathematics of Control, Signals and Systems , 16 : 238 – 253 .
- Mäkilä , P and Partington , J . 1999 . On Robustness in System Identification . Automatica , 35 : 907 – 916 .
- Milanese , M and Vicino , A . 1993 . Information Based Complexity and Nonparametric Worst-Case System Identification . Journal of Complexity , 9 : 427 – 445 .
- Milanese M Norton J Piet-Lahanier H Walter E (Eds) 1996 Bounding Approaches to System Identification New York Plenum Press
- Novak E 1988 Deterministic and Stochastic Error Bounds in Numerical Analysis Lecture Notes in Mathematics 1349 Berlin Springer – Verlag
- Walter E (Ed.) 1990 Parameter Identifications with Error Bound Special issue in: Mathematics and Computers in Simulation 32 5 – 6