Search in:

Mathematical and Computer Modelling of Dynamical Systems

Methods, Tools and Applications in Engineering and Related Sciences

Volume 11, 2005 - Issue 2: Set-membership Modelling of Uncertainties in Dynamical Systems. Guest Editors: Felix Chernousko and Boris Polyak

Submit an article Journal homepage

Free access

248

Views

CrossRef citations to date

Altmetric

Listen

Original Articles

Optimal and suboptimal algorithms in set membership identification

Bolesław KacewiczCorrespondence[email protected]

Pages 159-169 | Published online: 16 Feb 2007

Cite this article
https://doi.org/10.1080/13873950500068575

In this article

1. Introduction
2. Problem formulation
3. Worst-case setting
4. Average-case setting
5. Average-case setting with weights
6. Conclusions
Acknowledgements
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF

Abstract

We discuss in this paper optimality properties of identification algorithms in a set membership framework. We deal with restricted-complexity (conditional) identification, where approximations (models) to a possibly complex system are selected from a low dimensional space. We discuss the worst- and average-case settings. In the worst-case setting, we present results on optimality, or suboptimality, of algorithms based on computing the unconditional or conditional Chebyshev centres of an uncertainty set. In the average-case setting, we show that the optimal algorithm is given by the projection of the unconditional Chebyshev centre. We show explicit formulas for its average errors, allowing us to see the contribution of all problem parameters to the minimal error. We discuss the case of weighted average errors corresponding to non-uniform distributions over uncertainty sets, and show how the weights influence the minimal identification error.

Keywords:

Identification
worst-case setting
average-case setting
weights
optimal algorithm
suboptimal algorithm

1. Introduction

The idea behind the set membership approach is to use a priori knowledge about the problem being solved, and the measured (or computed, or observed) information about unknown quantities to define a feasible set of elements consistent with the available information. A number of results is devoted to the characterization of feasible sets in system identification, see [Citation1 – Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation13] and the references therein. In many cases, such as, e.g., in restricted-complexity estimation, the desired estimate is often required to be a member of a set of given simple structure. In this case estimation is referred to as conditional [Citation6].

In this paper, we present results obtained within a set membership framework in two settings: the worst-case and average-case settings. In the worst-case setting, the error of an algorithm is measured by the worst performance for problems consistent with given information, see e.g., [Citation1 – Citation Citation Citation Citation Citation6], [Citation9 – Citation Citation11]. A conservative point of view is thus chosen in the worst-case analysis: exceptionally difficult elements, even if they appear rarely, are taken into account. The optimal algorithm in this setting is defined by the conditional Chebyshev centre of the feasible set. The conditional Chebyshev centre is in general difficult to compute, even for sets of a simple structure [Citation3]. For this reason, we turn our attention to suboptimal, but easy-to-compute estimators. To this end, we discuss, based on [Citation2], the reliability level of the projection algorithm.

Compared to the worst-case approach, the average-case setting is less conservative. The error of an algorithm is now defined on the average, which excludes outliers [Citation7]. The optimal algorithm turns out to be the projection algorithm whose suboptimality has been established in the worst-case setting. We provide formulas for its two local average errors, which explicitly exhibit the contribution of the factors such as information, measurement errors, and restrictions imposed on the models.

In addition to the uniform distribution over uncertainty sets considered in [Citation7], we discuss, based on [Citation8], weighted average errors for a class of weight functions. The results are given explaining the contribution of the weights to the minimal errors. The contribution is through certain integral parameters that can be computed, or estimated, for a given selection of the weights. An example is given illustrating what the estimators and the errors look like for a particular identification problem.

2. Problem formulation

We wish to identify a system x belonging to the system space X = R ⁿ . We look for a model (an approximation) to x, having at our disposal information about x given by a vector of measurements y which belongs to the information space Y = R ^m , with m ⩾ n. The spaces X and Y are equipped with norms ||x|| _X = ||x|| and ||y|| _Y = ||Ay||, where A is a non-singular m × m matrix, and || || denotes the Euclidean norm in X or Y, with

, for any vector c∈R ^k with components c_i . Given a full rank m × n matrix U (called the information matrix), the vector y is given as a perturbed value of Ux,

where ε > 0 is the accuracy of computing (or measuring) information.

For example, the information vector may be obtained as follows. For the input sequence u = [u ₀,u ₁, …, u_m _– 1] ^T with u ₀ ≠ 0, the components of y can be given as output measurements related to u by

where e_l is an error corrupting the output (we set x_k = 0 for k > n). Some normalization condition is imposed on the sequence u, saying, for instance, that a norm of u does not exceed 1. Hence,

where e = [e ₁,e ₂, … e_m ] ^T is assumed to satisfy ||e|| _Y ⩽ ε. The matrix U in this case is the m × n Toeplitz-type matrix

Based on y, we compute an approximation to x using an algorithm, by which we mean any mapping

where ℳ is a given subset of X. An approximation to x is provided by the vector φ(y)∈ ℳ. If ℳ is a proper subset of the system space X, then identification is called conditional; see, e.g., [Citation2], [Citation6]. The choice of ℳ is usually aimed at restricting the complexity of the approximating models. Identification is unconditional if ℳ = X.

3. Worst-case setting

In this setting, motivated by the aim of constructing robust algorithms, the behaviour of an approximant is studied for the worst system compatible with computed information. For a given measurement y, the error of an algorithm φ, the y-local worst-case error, is defined by

where the set ℰ_γ (assumed non-empty) contains all systems consistent with information y,

The error Equation(5) measures the worst uncertainty that an algorithm may produce for a given measurement y.

If the estimate selection is unconstrained, the best choice of z = φ(y) is given by the Chebyshev centre of the set ℰ_γ. (This remains true for an arbitrary set ℰ_γ, not necessarily given by Equation(6), and any norm in X.) Indeed, the Chebyshev centre of ℰ_γ is defined as an element c(ℰ_γ) that minimizes the maximal distance from elements of ℰ_γ among all z in X,

The algorithm defined by φ(y) = c(ℰ_γ) is optimal, and its worst-case error is equal to the geometric radius of ℰ_γ,

In many cases of interest, the Chebyshev centre is readily available. For instance, the uncertainty set E _γ in Equation(6) is an ellipsoid in R ⁿ with the centre (Chebyshev centre) given by

, where

and

Computational aspects of the optimality problem are more complicated for conditional identification, even for simple uncertainty sets. The optimal algorithm, denoted by φ^opt, is now defined by the conditional Chebyshev centre, a minimizer selected among admissible models in the set ℳ:

If a linear parametrization of the selected estimate is desired, then the set M is defined as a linear p-dimensional manifold in R ⁿ (p ⩽ n). The computation of the conditional Chebyshev centre in this case, even for such a simple set as an ellipsoid, is not a straightforward task; see [Citation3] for an iterative method for finding c ^cond (ℰ_γ).

A simple idea to overcome the difficulties with computing the conditional Chebyshev centre is to take the optimal unconditional estimate given by the unconditional Chebyshev centre c(ℰ_γ) (which is easy to compute for many sets ℰ_γ), and to find its projection onto the manifold ℳ. The algorithm obtained in this way is denoted by φ^proj. It is quite obvious that this algorithm is in general not optimal, and the question is: how much do we lose with respect to the optimal one? Is the projection algorithm suboptimal within a reasonable margin? The answer to these questions is given in the following result [Citation2], which holds for much more general uncertainty sets than ellipsoids. We only assume that:

and that ℳ is a p-dimensional linear manifold,

where α_i∈R, z ⁱ ∈R ⁿ , (z ⁱ ) ^T z ^j = 0 for i, j = 0, 1, …, p, i ≠ j and ||z ⁱ || = 1 for i = 1, 2,…,p.

In [Citation2], the following theorem has been shown.

Theorem 1 If Equation (10) and Equation (11) hold, then

Furthermore, the constant

cannot be improved.

The theorem ensures that the projection algorithm, although not optimal, is suboptimal within a reasonable constant of

, i.e., less than 16% of accuracy is lost with respect to the accuracy of the optimal algorithm. This holds independently of the dimensions n and p. The second statement of the theorem follows from the following example [Citation2]. Let n = 2,

, and ℳ = {(x ₁,x ₂)∈R ²:x ₂ = 0}. Then c(ℰ_γ) = (0, 1), the projection of it onto ℳ is (0, 0), and

. The error of the projection algorithm is equal to

, while the optimal error is

. The ratio is exactly

4. Average-case setting

The worst-case analysis is necessary when robust approximations are of interest. If, however, robustness is not our main concern, the worst-case error criterion may be viewed as conservative. In [Citation7] a less conservative error measure has been analysed, which reflects the average performance of identification algorithms.

We remain in the set membership framework, and assume, similarly to the worst-case setting, that systems consistent with a measurement y belong to the set ℰ_γ given in Equation(6). The basic quantity is now the y-local average error of an algorithm defined by

where |A| is the Lebesgue measure of a set A and ‘∫’ is the Lebesgue integral in R ⁿ . In contrast to Equation(5), the error of an algorithm is now measured by its mean square (not the worst-case) behaviour in ℰ_γ.

In addition to Equation(13), the other error measure has also been considered in [Citation7]. It is sometimes desirable to assess the behaviour of an algorithm for a fixed system x (not for a fixed measurement y). Based on the errors

for x, we define the x-local average error, which describes the behaviour of an algorithm in the set of all measurements consistent with a fixed system x,

The x-local average error is given by

for a measurable function φ.

The following result provides the formula for the y-local error of an arbitrary algorithm φ, see [Citation7]. Let K_k = {t∈R ^k : || t || ⩽ 1} be the unit ball in the standard Euclidean norm, let

(for ℰ_y ≠ ∅, the real non-negative number ε _y is well defined and ε _y ⩽ ε), and let B be any n × n matrix such that

Theorem 2 For any ℳ ⊂ R ⁿ and φ,

Hence, the optimal algorithm φ^opt is defined by

(if the projection is well defined). This is the projection of the Chebyshev centre of ℰ_y onto ℳ. Unlike in the worst-case setting, the optimal algorithm in the average-case setting coincides with the projection algorithm. Its error is given in the following result [Citation7].

Theorem 3

For a linear manifold ℳ in Equation(11), the x-local error of the optimal algorithm is given in the next result, see [Citation7]. Let M = [z ¹, … , z ^p be the orthogonal matrix with column vectors that span the space ℳ.

Theorem 4 Let ℳ be given by Equation (11) . Then

This formula explicitly shows the influence on the minimal error of the factors such as the number of measurements m, the information matrix U, the accuracy of measuring information ε, the matrix M representing the subspace ℳ, and the unmodelled dynamics term dist(x,ℳ). For instance, we can now formulate, in algebraic language, criteria for the optimal selection of information matrix or the subspace ℳ to minimize the error of the optimal algorithm, see [Citation7].

5. Average-case setting with weights

The average errors considered in the previous section are defined under the assumption that the distribution of elements in the feasible sets ℰ_y and ℰ ^x is uniform. This is not a bad assumption when no additional information concerning the distribution is available. In many cases, however, such knowledge is available, and other distributions of measurements and problem elements, such as, for instance, Gaussian distribution, may better describe the problem. The question arises: how far do the results on optimal estimation given in the previous section rely on the assumption of a uniform distribution? The results given in this section, based on [Citation8], provide the answer for a wide class of distributions.

Let us generalize the notion of the average error, and define the weighted average y-local error. Assume for a fixed y that a non-negative Lebesgue integrable function p_y : ℰ_y→R is given such that

. The weighted y-local average error of an algorithm φ is defined by

This error criterion reflects the average behaviour of an algorithm with respect to a weighted Lebesgue measure concentrated on ℰ _y .

To define the weighted x-local error, we assume that a non-negative Lebesgue integrable function q ^x : ℰ ^x →R is given such that

. For a measurable function φ, we define the weighted x-local average error of φ by

Weight functions p_y and q ^x under consideration will be defined by transferring some basic functions

and

to proper sets as follows. Let

be a non-negative Lebesgue integrable function with

. To define p_y , we express the set ℰ _y in equivalent form. Recall that an n × n non-singular matrix B is such that

. Defining

, we have that ||Ux – y|| _y ⩽ ε iff ||Bx – b|| ⩽ ε _y , where ε _y is defined in the previous section. Since we wish to study uncertain information, for which the unique recovery of x from the knowledge of y is not possible, we assume that ℰ _y is not a singleton, i.e., ε _y > 0. We define the weight function p_y by

The function p_y (x) depends on the matrix B; a possible choice is

. We shall see that the optimal algorithm is independent of the selection of the matrix B. For a wide class of weight functions the same holds true for the optimal error.

Similarly to p_y , we define the function q ^x by

where

is a non-negative integrable function with

The choice of weight functions is not an easy task. Selecting the weights provides in fact additional a priori information about the problem, which should reflect our knowledge about it. A proper weight selection allows us to include relevant a priori information into the problem formulation. For a discussion of the problem of measure selection in a general average setting one is referred to [Citation12], p. 68.

We shall see that in many interesting cases the optimal algorithm is independent of the weights. The error of the optimal algorithm is weight-dependent, and the dependence will be explicitly shown in terms of certain parameters that are easy to compute for any selection of the weights.

We now restrict ourselves to isotropic weight functions; for a discussion of the more general case one is referred to [Citation8]. We assume that basic weight functions

and

take constant values on spheres in the respective spaces. Letting w and

be real non-negative continuous functions on [0,1], we define

First of all, it is possible to show [Citation8] that, similarly to the case of uniform distributions, the optimal algorithm φ^opt is given by the projection of the least squares approximation on ℳ,

provided that the projection is well defined. (This holds for any symmetric weight function

.) Hence, φ^opt does not depend on the weights. The following result shown in [Citation8] gives explicit formulas for the errors of the optimal algorithm, and explains the influence of the weights.

Let α _n and

be parameters depending on the weights, given by

A similar formula describes the numbers

, with w replaced by

. The following theorem holds.

Theorem 5 For weight functions given by Equation (23) , we have

and

The errors of the optimal algorithm depend on the weights through the parameters α _n and

that can be computed (or estimated) for each particular choice of the weights. Bounds on these parameters for three classes of weight functions – continuous, differentiable and truncated Gaussian functions – can be found in [Citation8].

The case of uniform distributions over ℰ_y and ℰ ^x corresponds to

and

, that is, to w(t)≡1 and

. Hence, in this case α _n = 1 and

, and Equation(25) and Equation(26) agree with Equation(17) and Equation(18).

Note that for isotropic weight functions the error depends on the matrix

, but is independent of the choice of the matrix B in Equation(21).

Following [Citation8], we illustrate the results given above for an exemplary problem.

Example Let A = I_m , where I_m is the identity m × m matrix, i.e., we consider the standard Euclidean norm in the space Y. Let information be given by k repeated measurements of all components of the system x. That is, the information matrix is given by U = (I_n , … I_n ) ^T , with I_n appearing k times, and exact information is the m × 1 vector Ux = [x ^T , … x ^T ] ^T , with x appearing k times. The total number of measurements m equals m = k · n.

Assume that we are able to measure components of x in such a way that the result obtained, an m × 1 vector y = [(y ¹) ^T , … (y ^k ) ^T ] ^T with k vector components yⁱ , has mean squared error at most

The uncertainty set ℰ_y thus has the form

(so that

). One can easily verify that U ^T U = kI_n ,

, and

. Hence, ℰ _y can be written as

where

The optimal algorithm is given by the projection onto ℳ of the arithmetic mean of y ⁱ ,

We show what the Equationformula (26) looks like for a particular choice of weight functions and a subspace ℳ. Let n ⩾ 2. Let the weights be defined by functions

, and ℳ = {x ∈R ⁿ : x ₃ = … = x_n = 0}. Then

, and

. We have that

Since

(it is equal to zero if n = 2), and

the error of φ^opt can be expressed as

where

The Equationformula (32) shows the dependence of the average error on

and k. The dependence of h(n,k) on k is not crucial. For a fixed n, when

goes to zero or k goes to infinity, the error tends to the unmodelled dynamics error

(whose appearance cannot be avoided since the identification is conditional), essentially as

For further illustration, we specify the example above for a simple problem with exemplary data. We wish to identify a system x depending on three parameters,

(n = 3). We measure each component of x three times (k = 3) in such a way that Equation(27) holds, the precision of the measurement device being

. Let the measured vectors be

The least squares approximation is then given by

For the subspace ℳ defined as above, the optimal (on the average) approximation is defined by

and has the y-local error

The x-local error for an exemplary x consistent with the measurements yⁱ , given by x = [5, − 3, 0]^T, is equal to

(since x is in ℳ, dist(x,ℳ) = 0), while the distance ||x – φ^opt||² is 2.881·10^– 3 (as we see, the actual distance is here greater than the average distance).

Let us finally note that the Equationformula (32) suggests an optimization problem concerning the experiment design. For a given positive δ, we wish to reduce the second term in Equation(32) to be of order δ² (neglecting the influence of h(n,k)). In order to achieve this, we can perform, for any k, k (vector) measurements with accuracy

. In reasonable models of experimental cost, the cost should increase with increasing number of experiments, k, and it should decrease if we relax the accuracy, i.e., if we increase

(this is the case when k grows). The question is to find an optimal k that minimizes the cost.

6. Conclusions

We have considered the performance of conditional estimators in set membership identification in the worst-case and average-case settings. Optimal estimators in the worst-case setting are in general difficult to compute, and suboptimal estimators such as the projection estimator are often considered. We have shown that the error of the projection algorithm never exceeds

times the error of the optimal algorithm. In the average-case setting, the projection algorithm turns out to be optimal. Explicit formulas for its local errors are given, showing the influence of the problem parameters on the minimal identification error. The case of weighted average errors is discussed, and the influence of the weights is explicitly shown.

Acknowledgment

This research was partly supported by local AGH grant No. 10.420.03.

References

Chernousko FL 1994 State Estimation for Dynamic Systems Boca Raton CRC Press
Google Scholar
Garulli , A , Kacewicz , B , Vicino , A and Zappa , G . 1999 . Reliability of Projection Algorithms in Conditional Estimation . Journal of Optimization Theory and Applications , 101 : 1 – 14 .
Google Scholar
Garulli A Vicino A Zappa G 1997 Conditional Central Algorithms for Worst-Case Estimation and Filtering In Proc. 36th IEEE Conference on Decision and Control San Diego
Google Scholar
Helmicki , AJ , Jacobson , CA and Nett , CN . 1991 . Control Oriented System Identification: A Worst-Case/Deterministic Approach in Hw . IEEE Transactions on Automatic Control , 36 : 1163 – 1176 .
Web of Science ®Google Scholar
Jacobson , CA , Nett , CN and Partington , JR . 1992 . Worst Case System Identification in l 1: Optimal Algorithms and Error Bounds . Systems and Control Letters , 19 : 419 – 424 .
Google Scholar
Kacewicz , B , Milanese , M and Vicino , A . 1988 . Conditionally Optimal Algorithms and Estimation of Reduced Order Models . Journal of Complexity , 4 : 73 – 85 .
Google Scholar
Kacewicz , B . 2000 . Optimal Average Case Estimation in Hilbert Norms . Mathematics of Control, Signals and Systems , 13 : 347 – 359 .
Google Scholar
Kacewicz , B . 2003 . Weighted Average Errors in Set-Membership Estimation . Mathematics of Control, Signals and Systems , 16 : 238 – 253 .
Google Scholar
Mäkilä , P and Partington , J . 1999 . On Robustness in System Identification . Automatica , 35 : 907 – 916 .
Google Scholar
Milanese , M and Vicino , A . 1993 . Information Based Complexity and Nonparametric Worst-Case System Identification . Journal of Complexity , 9 : 427 – 445 .
Google Scholar
Milanese M Norton J Piet-Lahanier H Walter E (Eds) 1996 Bounding Approaches to System Identification New York Plenum Press
Google Scholar
Novak E 1988 Deterministic and Stochastic Error Bounds in Numerical Analysis Lecture Notes in Mathematics 1349 Berlin Springer – Verlag
Google Scholar
Walter E (Ed.) 1990 Parameter Identifications with Error Bound Special issue in: Mathematics and Computers in Simulation 32 5 – 6
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Optimal and suboptimal algorithms in set membership identification

Abstract

1. Introduction

2. Problem formulation

3. Worst-case setting

4. Average-case setting

5. Average-case setting with weights

6. Conclusions

Acknowledgment

References

References

Information for

Open access

Opportunities

Help and information

Optimal and suboptimal algorithms in set membership identification

Abstract

1. Introduction

2. Problem formulation

3. Worst-case setting

4. Average-case setting

5. Average-case setting with weights

6. Conclusions

Acknowledgment

References

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date