248
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Optimal and suboptimal algorithms in set membership identification

Pages 159-169 | Published online: 16 Feb 2007

Abstract

We discuss in this paper optimality properties of identification algorithms in a set membership framework. We deal with restricted-complexity (conditional) identification, where approximations (models) to a possibly complex system are selected from a low dimensional space. We discuss the worst- and average-case settings. In the worst-case setting, we present results on optimality, or suboptimality, of algorithms based on computing the unconditional or conditional Chebyshev centres of an uncertainty set. In the average-case setting, we show that the optimal algorithm is given by the projection of the unconditional Chebyshev centre. We show explicit formulas for its average errors, allowing us to see the contribution of all problem parameters to the minimal error. We discuss the case of weighted average errors corresponding to non-uniform distributions over uncertainty sets, and show how the weights influence the minimal identification error.

1. Introduction

The idea behind the set membership approach is to use a priori knowledge about the problem being solved, and the measured (or computed, or observed) information about unknown quantities to define a feasible set of elements consistent with the available information. A number of results is devoted to the characterization of feasible sets in system identification, see [Citation1 – Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation13] and the references therein. In many cases, such as, e.g., in restricted-complexity estimation, the desired estimate is often required to be a member of a set of given simple structure. In this case estimation is referred to as conditional [Citation6].

In this paper, we present results obtained within a set membership framework in two settings: the worst-case and average-case settings. In the worst-case setting, the error of an algorithm is measured by the worst performance for problems consistent with given information, see e.g., [Citation1 – Citation Citation Citation Citation Citation6], [Citation9 – Citation Citation11]. A conservative point of view is thus chosen in the worst-case analysis: exceptionally difficult elements, even if they appear rarely, are taken into account. The optimal algorithm in this setting is defined by the conditional Chebyshev centre of the feasible set. The conditional Chebyshev centre is in general difficult to compute, even for sets of a simple structure [Citation3]. For this reason, we turn our attention to suboptimal, but easy-to-compute estimators. To this end, we discuss, based on [Citation2], the reliability level of the projection algorithm.

Compared to the worst-case approach, the average-case setting is less conservative. The error of an algorithm is now defined on the average, which excludes outliers [Citation7]. The optimal algorithm turns out to be the projection algorithm whose suboptimality has been established in the worst-case setting. We provide formulas for its two local average errors, which explicitly exhibit the contribution of the factors such as information, measurement errors, and restrictions imposed on the models.

In addition to the uniform distribution over uncertainty sets considered in [Citation7], we discuss, based on [Citation8], weighted average errors for a class of weight functions. The results are given explaining the contribution of the weights to the minimal errors. The contribution is through certain integral parameters that can be computed, or estimated, for a given selection of the weights. An example is given illustrating what the estimators and the errors look like for a particular identification problem.

2. Problem formulation

We wish to identify a system x belonging to the system space X = R n . We look for a model (an approximation) to x, having at our disposal information about x given by a vector of measurements y which belongs to the information space Y = R m , with m ⩾ n. The spaces X and Y are equipped with norms ||x|| X  = ||x|| and ||y|| Y  = ||Ay||, where A is a non-singular m × m matrix, and || || denotes the Euclidean norm in X or Y, with

, for any vector cR k with components ci . Given a full rank m × n matrix U (called the information matrix), the vector y is given as a perturbed value of Ux,
where ε > 0 is the accuracy of computing (or measuring) information.

For example, the information vector may be obtained as follows. For the input sequence u = [u 0,u 1, …, um  – 1] T with u 0 ≠ 0, the components of y can be given as output measurements related to u by

where el is an error corrupting the output (we set xk  = 0 for k > n). Some normalization condition is imposed on the sequence u, saying, for instance, that a norm of u does not exceed 1. Hence,
where e = [e 1,e 2, … em ] T is assumed to satisfy ||e|| Y  ⩽ ε. The matrix U in this case is the m × n Toeplitz-type matrix

Based on y, we compute an approximation to x using an algorithm, by which we mean any mapping

where ℳ is a given subset of X. An approximation to x is provided by the vector φ(y)∈ ℳ. If ℳ is a proper subset of the system space X, then identification is called conditional; see, e.g., [Citation2], [Citation6]. The choice of ℳ is usually aimed at restricting the complexity of the approximating models. Identification is unconditional if ℳ = X.

3. Worst-case setting

In this setting, motivated by the aim of constructing robust algorithms, the behaviour of an approximant is studied for the worst system compatible with computed information. For a given measurement y, the error of an algorithm φ, the y-local worst-case error, is defined by

where the set ℰγ (assumed non-empty) contains all systems consistent with information y,
The error Equation(5) measures the worst uncertainty that an algorithm may produce for a given measurement y.

If the estimate selection is unconstrained, the best choice of z = φ(y) is given by the Chebyshev centre of the set ℰγ. (This remains true for an arbitrary set ℰγ, not necessarily given by Equation(6), and any norm in X.) Indeed, the Chebyshev centre of ℰγ is defined as an element c(ℰγ) that minimizes the maximal distance from elements of ℰγ among all z in X,

The algorithm defined by φ(y) = c(ℰγ) is optimal, and its worst-case error is equal to the geometric radius of ℰγ,

In many cases of interest, the Chebyshev centre is readily available. For instance, the uncertainty set E γ in Equation(6) is an ellipsoid in R n with the centre (Chebyshev centre) given by

, where
and
.

Computational aspects of the optimality problem are more complicated for conditional identification, even for simple uncertainty sets. The optimal algorithm, denoted by φopt, is now defined by the conditional Chebyshev centre, a minimizer selected among admissible models in the set ℳ:

If a linear parametrization of the selected estimate is desired, then the set M is defined as a linear p-dimensional manifold in R n (p ⩽ n). The computation of the conditional Chebyshev centre in this case, even for such a simple set as an ellipsoid, is not a straightforward task; see [Citation3] for an iterative method for finding c cond (ℰγ).

A simple idea to overcome the difficulties with computing the conditional Chebyshev centre is to take the optimal unconditional estimate given by the unconditional Chebyshev centre c(ℰγ) (which is easy to compute for many sets ℰγ), and to find its projection onto the manifold ℳ. The algorithm obtained in this way is denoted by φproj. It is quite obvious that this algorithm is in general not optimal, and the question is: how much do we lose with respect to the optimal one? Is the projection algorithm suboptimal within a reasonable margin? The answer to these questions is given in the following result [Citation2], which holds for much more general uncertainty sets than ellipsoids. We only assume that:

and that ℳ is a p-dimensional linear manifold,
where αiR, z i R n , (z i ) T z j  = 0 for i, j = 0, 1, …, p, ij and ||z i || = 1 for i = 1, 2,…,p.

In [Citation2], the following theorem has been shown.

Theorem 1If Equation (10) and Equation (11) hold, then

Furthermore, the constant
cannot be improved.

The theorem ensures that the projection algorithm, although not optimal, is suboptimal within a reasonable constant of

, i.e., less than 16% of accuracy is lost with respect to the accuracy of the optimal algorithm. This holds independently of the dimensions n and p. The second statement of the theorem follows from the following example [Citation2]. Let n = 2,
, and ℳ = {(x 1,x 2)∈R 2:x 2 = 0}. Then c(ℰγ) = (0, 1), the projection of it onto ℳ is (0, 0), and
. The error of the projection algorithm is equal to
, while the optimal error is
. The ratio is exactly
.

4. Average-case setting

The worst-case analysis is necessary when robust approximations are of interest. If, however, robustness is not our main concern, the worst-case error criterion may be viewed as conservative. In [Citation7] a less conservative error measure has been analysed, which reflects the average performance of identification algorithms.

We remain in the set membership framework, and assume, similarly to the worst-case setting, that systems consistent with a measurement y belong to the set ℰγ given in Equation(6). The basic quantity is now the y-local average error of an algorithm defined by

where |A| is the Lebesgue measure of a set A and ‘∫’ is the Lebesgue integral in R n . In contrast to Equation(5), the error of an algorithm is now measured by its mean square (not the worst-case) behaviour in ℰγ.

In addition to Equation(13), the other error measure has also been considered in [Citation7]. It is sometimes desirable to assess the behaviour of an algorithm for a fixed system x (not for a fixed measurement y). Based on the errors

for x, we define the x-local average error, which describes the behaviour of an algorithm in the set of all measurements consistent with a fixed system x,
The x-local average error is given by
for a measurable function φ.

The following result provides the formula for the y-local error of an arbitrary algorithm φ, see [Citation7]. Let Kk  = {tR k : || t || ⩽ 1} be the unit ball in the standard Euclidean norm, let

(for ℰy ≠ ∅, the real non-negative number ε y is well defined and ε y  ⩽ ε), and let B be any n × n matrix such that
.

Theorem 2For any ℳ ⊂ R n and φ,

Hence, the optimal algorithm φopt is defined by

(if the projection is well defined). This is the projection of the Chebyshev centre of ℰy onto ℳ. Unlike in the worst-case setting, the optimal algorithm in the average-case setting coincides with the projection algorithm. Its error is given in the following result [Citation7].

Theorem 3

For a linear manifold ℳ in Equation(11), the x-local error of the optimal algorithm is given in the next result, see [Citation7]. Let M = [z 1, … , z p be the orthogonal matrix with column vectors that span the space ℳ.

Theorem 4Letbe given by Equation (11) . Then

This formula explicitly shows the influence on the minimal error of the factors such as the number of measurements m, the information matrix U, the accuracy of measuring information ε, the matrix M representing the subspace ℳ, and the unmodelled dynamics term dist(x,ℳ). For instance, we can now formulate, in algebraic language, criteria for the optimal selection of information matrix or the subspace ℳ to minimize the error of the optimal algorithm, see [Citation7].

5. Average-case setting with weights

The average errors considered in the previous section are defined under the assumption that the distribution of elements in the feasible sets ℰy and ℰ x is uniform. This is not a bad assumption when no additional information concerning the distribution is available. In many cases, however, such knowledge is available, and other distributions of measurements and problem elements, such as, for instance, Gaussian distribution, may better describe the problem. The question arises: how far do the results on optimal estimation given in the previous section rely on the assumption of a uniform distribution? The results given in this section, based on [Citation8], provide the answer for a wide class of distributions.

Let us generalize the notion of the average error, and define the weighted average y-local error. Assume for a fixed y that a non-negative Lebesgue integrable function py  : ℰyR is given such that

. The weighted y-local average error of an algorithm φ is defined by
This error criterion reflects the average behaviour of an algorithm with respect to a weighted Lebesgue measure concentrated on ℰ y .

To define the weighted x-local error, we assume that a non-negative Lebesgue integrable function q x  : ℰ x R is given such that

. For a measurable function φ, we define the weighted x-local average error of φ by
Weight functions py and q x under consideration will be defined by transferring some basic functions
and
to proper sets as follows. Let
be a non-negative Lebesgue integrable function with
. To define py , we express the set ℰ y in equivalent form. Recall that an n × n non-singular matrix B is such that
. Defining
, we have that ||Ux – y||  y ⩽ ε iff ||Bx – b|| ⩽ ε y , where ε y is defined in the previous section. Since we wish to study uncertain information, for which the unique recovery of x from the knowledge of y is not possible, we assume that ℰ y is not a singleton, i.e., ε y  > 0. We define the weight function py by
The function py (x) depends on the matrix B; a possible choice is
. We shall see that the optimal algorithm is independent of the selection of the matrix B. For a wide class of weight functions the same holds true for the optimal error.

Similarly to py , we define the function q x by

where
is a non-negative integrable function with

The choice of weight functions is not an easy task. Selecting the weights provides in fact additional a priori information about the problem, which should reflect our knowledge about it. A proper weight selection allows us to include relevant a priori information into the problem formulation. For a discussion of the problem of measure selection in a general average setting one is referred to [Citation12], p. 68.

We shall see that in many interesting cases the optimal algorithm is independent of the weights. The error of the optimal algorithm is weight-dependent, and the dependence will be explicitly shown in terms of certain parameters that are easy to compute for any selection of the weights.

We now restrict ourselves to isotropic weight functions; for a discussion of the more general case one is referred to [Citation8]. We assume that basic weight functions

and
take constant values on spheres in the respective spaces. Letting w and
be real non-negative continuous functions on [0,1], we define
First of all, it is possible to show [Citation8] that, similarly to the case of uniform distributions, the optimal algorithm φopt is given by the projection of the least squares approximation on ℳ,
provided that the projection is well defined. (This holds for any symmetric weight function
.) Hence, φopt does not depend on the weights. The following result shown in [Citation8] gives explicit formulas for the errors of the optimal algorithm, and explains the influence of the weights.

Let α n and

be parameters depending on the weights, given by
A similar formula describes the numbers
, with w replaced by
. The following theorem holds.

Theorem 5For weight functions given by Equation (23) , we have

and

The errors of the optimal algorithm depend on the weights through the parameters α n and

that can be computed (or estimated) for each particular choice of the weights. Bounds on these parameters for three classes of weight functions – continuous, differentiable and truncated Gaussian functions – can be found in [Citation8].

The case of uniform distributions over ℰy and ℰ x corresponds to

and
, that is, to w(t)≡1 and
. Hence, in this case α n  = 1 and
, and Equation(25) and Equation(26) agree with Equation(17) and Equation(18).

Note that for isotropic weight functions the error depends on the matrix

, but is independent of the choice of the matrix B in Equation(21).

Following [Citation8], we illustrate the results given above for an exemplary problem.

Example Let A = Im , where Im is the identity m × m matrix, i.e., we consider the standard Euclidean norm in the space Y. Let information be given by k repeated measurements of all components of the system x. That is, the information matrix is given by U = (In , … In ) T , with In appearing k times, and exact information is the m × 1 vector Ux = [x T , … x T ] T , with x appearing k times. The total number of measurements m equals m = k · n.

Assume that we are able to measure components of x in such a way that the result obtained, an m × 1 vector y = [(y 1) T , … (y k ) T ] T with k vector components yi , has mean squared error at most

,
The uncertainty set ℰy thus has the form
(so that
). One can easily verify that U T U = kIn ,
, and
. Hence, ℰ y can be written as
where
The optimal algorithm is given by the projection onto ℳ of the arithmetic mean of y i ,
We show what the Equationformula (26) looks like for a particular choice of weight functions and a subspace ℳ. Let n ⩾ 2. Let the weights be defined by functions
, and ℳ = {x ∈R n  : x 3 = … = xn  = 0}. Then
, and
. We have that
Since
(it is equal to zero if n = 2), and
the error of φopt can be expressed as
where
The Equationformula (32) shows the dependence of the average error on
and k. The dependence of h(n,k) on k is not crucial. For a fixed n, when
goes to zero or k goes to infinity, the error tends to the unmodelled dynamics error
(whose appearance cannot be avoided since the identification is conditional), essentially as
.

For further illustration, we specify the example above for a simple problem with exemplary data. We wish to identify a system x depending on three parameters,

(n = 3). We measure each component of x three times (k = 3) in such a way that Equation(27) holds, the precision of the measurement device being
. Let the measured vectors be
The least squares approximation is then given by
For the subspace ℳ defined as above, the optimal (on the average) approximation is defined by
and has the y-local error
The x-local error for an exemplary x consistent with the measurements yi , given by x = [5, − 3, 0]T, is equal to
(since x is in ℳ, dist(x,ℳ) = 0), while the distance ||x – φopt||2 is 2.881·10 – 3 (as we see, the actual distance is here greater than the average distance).

Let us finally note that the Equationformula (32) suggests an optimization problem concerning the experiment design. For a given positive δ, we wish to reduce the second term in Equation(32) to be of order δ2 (neglecting the influence of h(n,k)). In order to achieve this, we can perform, for any k, k (vector) measurements with accuracy

. In reasonable models of experimental cost, the cost should increase with increasing number of experiments, k, and it should decrease if we relax the accuracy, i.e., if we increase
(this is the case when k grows). The question is to find an optimal k that minimizes the cost.

6. Conclusions

We have considered the performance of conditional estimators in set membership identification in the worst-case and average-case settings. Optimal estimators in the worst-case setting are in general difficult to compute, and suboptimal estimators such as the projection estimator are often considered. We have shown that the error of the projection algorithm never exceeds

times the error of the optimal algorithm. In the average-case setting, the projection algorithm turns out to be optimal. Explicit formulas for its local errors are given, showing the influence of the problem parameters on the minimal identification error. The case of weighted average errors is discussed, and the influence of the weights is explicitly shown.

Acknowledgment

This research was partly supported by local AGH grant No. 10.420.03.

References

References

  • Chernousko FL 1994 State Estimation for Dynamic Systems Boca Raton CRC Press
  • Garulli , A , Kacewicz , B , Vicino , A and Zappa , G . 1999 . Reliability of Projection Algorithms in Conditional Estimation . Journal of Optimization Theory and Applications , 101 : 1 – 14 .
  • Garulli A Vicino A Zappa G 1997 Conditional Central Algorithms for Worst-Case Estimation and Filtering In Proc. 36th IEEE Conference on Decision and Control San Diego
  • Helmicki , AJ , Jacobson , CA and Nett , CN . 1991 . Control Oriented System Identification: A Worst-Case/Deterministic Approach in Hw . IEEE Transactions on Automatic Control , 36 : 1163 – 1176 .
  • Jacobson , CA , Nett , CN and Partington , JR . 1992 . Worst Case System Identification in l 1: Optimal Algorithms and Error Bounds . Systems and Control Letters , 19 : 419 – 424 .
  • Kacewicz , B , Milanese , M and Vicino , A . 1988 . Conditionally Optimal Algorithms and Estimation of Reduced Order Models . Journal of Complexity , 4 : 73 – 85 .
  • Kacewicz , B . 2000 . Optimal Average Case Estimation in Hilbert Norms . Mathematics of Control, Signals and Systems , 13 : 347 – 359 .
  • Kacewicz , B . 2003 . Weighted Average Errors in Set-Membership Estimation . Mathematics of Control, Signals and Systems , 16 : 238 – 253 .
  • Mäkilä , P and Partington , J . 1999 . On Robustness in System Identification . Automatica , 35 : 907 – 916 .
  • Milanese , M and Vicino , A . 1993 . Information Based Complexity and Nonparametric Worst-Case System Identification . Journal of Complexity , 9 : 427 – 445 .
  • Milanese M Norton J Piet-Lahanier H Walter E (Eds) 1996 Bounding Approaches to System Identification New York Plenum Press
  • Novak E 1988 Deterministic and Stochastic Error Bounds in Numerical Analysis Lecture Notes in Mathematics 1349 Berlin Springer – Verlag
  • Walter E (Ed.) 1990 Parameter Identifications with Error Bound Special issue in: Mathematics and Computers in Simulation 32 5 – 6

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.