457
Views
1
CrossRef citations to date
0
Altmetric
Original Articles

A short note on SPSA techniques and their use in nonlinear bioprocess identification

, &
Pages 415-422 | Published online: 16 Feb 2007

Abstract

Simultaneous perturbation stochastic approximation (SPSA) is a gradient-based optimization method which has become popular since the 1990s. In contrast with standard numerical procedures, this method requires only a few cost function evaluations to obtain gradient information, and can therefore be advantageously applied when identifying a large number of unknown model parameters, as for instance in neural network models or first-principles models. In this paper, a first-order SPSA algorithm is introduced, which makes use of adaptive gain sequences, gradient smoothing and a step rejection procedure to enhance convergence and stability. The algorithm performance is illustrated with the estimation of the most-likely kinetic parameters and initial conditions of a bioprocess model describing the evolution of batch animal cell cultures.

1. Introduction

Process modelling requires the estimation of several unknown parameters from noisy measurement data. In this connection, gradient-based optimization methods are popular tools for minimizing least-squares or maximum-likelihood cost functions – depending on the assumptions on the measurement noise – measuring the deviation between the model and real system outputs.

Several techniques for computing the gradient of the cost function are available, including finite difference approximations or direct differentiation, which involves the numerical solution of sensitivity equations.

In the above-mentioned techniques, the computational expense required to estimate the current gradient direction is directly proportional to the number of unknown model parameters, which becomes an issue for models involving a large number of parameters. This is typically the case in neural network (NN) modelling, but can also occur in other circumstances, such as the estimation of parameters and initial conditions in first-principles models. Moreover, the derivation of sensitivity equations requires analytic manipulation of the model equation, which is time consuming and prone to errors (note that an important stream of current research and development is dedicated to the automatic generation of such equations using symbolic manipulation).

In contrast to standard finite differences which approximate the gradient by varying all the parameters one at a time, the simultaneous perturbation (SP) approximation of the gradient proposed by Spall Citation1 makes use of a very efficient technique based on a simultaneous (random) perturbation in all the parameters. Hence, one gradient evaluation requires only two evaluations of the cost function. This approach has first been applied to gradient estimation in a first-order stochastic approximation (SA) algorithm Citation1, and more recently to Hessian estimation in an accelerated second-order SPSA algorithm Citation2.

Efficiency, simplicity of implementation and very modest computational costs make first-order SPSA (1SPSA) particularly attractive, even though it suffers from the classical drawback of first-order algorithms, i.e. a slowing down in the convergence as an optimum is approached. Unfortunately, this phenomenon is even more pronounced in the case of SP techniques since the gradient information is more delicate to ‘extract’ from the – usually rather ‘flat’ – neighbourhood of the optimum.

In this study, a variant of the original first-order algorithm is considered which makes use of adaptive gain sequences, gradient smoothing and a step rejection procedure, to enhance convergence and stability. To demonstrate the algorithm efficiency and versatility, attention is focused on the estimation of kinetic parameters (and initial conditions) in a macroscopic model of animal cell cultures. To this end, a maximum-likelihood cost function based on the experimental measurements of biomass, glucose, glutamine and lactate concentrations is minimized.

This paper is organized as follows. Section 2 introduces the basic principle of the first-order SPSA algorithm used throughout this study. In section 3, the algorithm is applied to the maximum-likelihood estimation of kinetic parameters and initial conditions of a bioprocess model from experimental measurements of several macroscopic component concentrations. Direct and cross-validation results demonstrate the good model agreement. Finally, section 4 is devoted to some concluding remarks.

2 A brief overview of SPSA

Consider the problem of minimizing a, possibly noisy, objective function J(θ) with respect to a vector θ of unknown parameters.

1SPSA is given by the following core recursion for the parameter vector θCitation1

in which ak is a non-negative scalar gain coefficient, and is an approximation of the criterion gradient obtained by varying all the elements of simultaneously, i.e.
where ck is a positive scalar and Δ k  = (Δ k1, Δ k2, …, Δ kp ) T with symmetrically Bernouilli distributed random variables {Δ ki }.

In its original formulation Citation1, 1SPSA makes use of decaying gain sequences {ak } and {ck } in the form

in order to ensure formal convergence. In these expressions, a, A and c denote constant parameters. Finite sample experiments show, however, that the algorithm may get stuck somewhere in the parameter space if the criterion value becomes significantly worse (due to a poor current gradient approximation, a non-convex optimization problem, etc.) and the gain sequences are too small to recover from this situation.

In order to enhance convergence and stability, the use of an adaptive gain sequence for parameter updating is considered, i.e.

In addition to gain attenuation when the value of the criterion becomes worse, ‘blocking’ mechanisms Citation2 are also applied, i.e. the current step is rejected and, starting from the previous parameter estimate, a new step is accomplished (with a new gradient evaluation and a reduced updating gain). The parameter β in (4) represents the permissible increase in the criterion, before step rejection and gain attenuation occur.

A constant gain sequence ck  = c could be used for gradient approximation, the value of c being selected so as to overcome the influence of (numerical or experimental) noise. In the optimum neighbourhood, a decaying sequence in the form (3) is required to evaluate the gradient with enough accuracy and avoid an amplification of the ‘slowing down’ effect mentioned in the introduction.

Finally, a gradient smoothing (GS) procedure is implemented, i.e. gradient approximations are averaged across iterations in the following way

where, starting with an initial value of ρ = ρ0, ρ k is decreased in a way similar to (4) when step rejection occurs (i.e. ρ k  = μρk−1 with μ ≤ 1) and is reset to its initial value after a successful step.

The use of these numerical artifices, i.e. adaptive gain sequences, step rejection procedure and gradient smoothing, significantly improves the effective practical performance of the algorithm, which, in the following, is denoted ‘adaptive 1SP-GS’.

Inequality constraints can also be taken into account by a projection algorithm introduced in Sadegh Citation3: the current parameter estimate is projected onto a closed set included in the admissible region in such a way that no function evaluation is required outside this latter region. In this study, bound constraints (e.g. positivity constraints) are handled in this way.

3 Modelling batch CHO cell cultures

In this experimental study, rare and asynchronous measurements of biomass (X), glucose (G), glutamine (Gln) and lactate (L) concentrations are available. The measurement equation is given by

where ξ(ti ) = [X(ti ) G(ti ) Gln(ti ) L(ti )] T , y(ti ) and ϵ(ti ) are the state, measurement and noise vectors at time ti , respectively. The measurement errors are assumed to be normally distributed, white noises with zero mean and variance matrix Q(ti ).

Data are collected from seven batch experiments corresponding to different initial glucose and glutamine concentrations. Five of these experiments are used for parameter estimation, the two remaining ones being used for cross-validation tests.

The parameter estimation procedure involves the minimization of a Gauss-Markov criterion

where yi , Qi and are the measurement vector, the measurement error covariance matrix and the state estimate obtained by integration of the model equations with the parameter vector θ.

The model equations are derived from mass balances and are given by

where the reaction term K φ(ξ) includes the reaction pseudo-stoichiometry matrix K defining the corresponding macroscopic reaction scheme and the kinetics φ(ξ).

The pseudo-stoichiometry can be estimated independently of the reaction kinetics by using a state transformation originally introduced in Bastin and Dochain Citation4, which is based on the following ideas.

If rank K = M, where M is the number of independent reactions, then there always exists a partition

where K a ∈ℜ M×M is full rank, i.e. rank Ka = M.

Accordingly, the partition of ξ is given by

Given such a partition of K, the following matrix equation
(where O ∈ ℜ(N − MM is a null matrix) has always a unique solution C ∈ℜ(N − MM .

Using this solution, an auxiliary vector z∈ℜ N − M can be defined as

whose dynamics is independent of the kinetics

Based on Equationequations (12) and Equation(13), a maximum-likelihood estimation Ĉ of the matrix C can be computed Citation5, and estimations [Kcirc]a and [Kcirc]b of the matrices Ka and Kb can be deduced from Equationequation (11). A systematic procedure for selecting the most-likely C-identifiable reaction scheme has been proposed in Bogaerts and Vande Wouwer Citation6.

This procedure can be applied to our case study with ξ a  = [X G] T , ξ b  = [G ln L] T , and dz(t)/dt = 0 (from Equationequation (13) with D = 0). Then, the following macroscopic reaction scheme is derived.

Growth:

Maintenance:
where ν Gln, ν X and ν L are pseudo-stoechiometry coefficients. The symbol →^ means that the growth reaction is auto-catalysed by X. The presence of ν X X in both sides of the maintenance reaction means that X catalyses this latter reaction.

The estimated pseudo-stoichiometry matrix is then given by

The growth rate ϕg and the maintenance rate ϕm appearing in Equationequations (14) and Equation(15) are described by a general kinetic model structure proposed in Bogaerts and Hanus Citation5.

The mass Equationequations (8) can therefore be written in a more explicit form

The eight kinetic coefficients in Equationequations (17) and Equation(18) and 20 initial concentrations appearing in Equationequations (19) – Equation Equation Equation(22) for the five batch experiments under consideration (initial concentrations are corrupted by measurement noise and their most-likely values have to be identified) are estimated by minimizing the cost function (7) with the adaptive 1SP-GS algorithm.

The tuning parameters in Equationequations (1) – Equation Equation Equation Equation(5) are selected as follows: c = 10−6, γ = 1.5 (a decaying sequence ck is used for gradient evaluation), a 0 = 10−8, η = 1.01, μ = 0.9, β = 0 (no relative increase in the criterion is allowed), ρ0 = 0.99. Starting with the measured initial concentrations (which are affected by measurement errors) and an initial guess for the kinetic parameters corresponding to a cost function Jml  = 65760, the minimization (7) leads to a cost function Jml  = 308 in 50 000 iterations. This number of iterations might appear quite large at first sight, but the computational cost is very modest as each iteration only requires two cost function evaluations (each of these evaluations involves five model simulations corresponding to the five experimental batches used in this identification phase). On the other hand, standard centred finite difference approximations would require 56 cost function evaluations per iteration!

The parameter estimates are listed in . compares the measurement data of one of the five experiments used in the parameter identification procedure with the model prediction (direct validation), whereas shows the same kind of comparison with the measurement data of one of the remaining two experiments (cross-validation). In these graphs, the circled points are the measured data and the bars represent the 99% confidence intervals. The solid lines are the concentration trajectories predicted by the identified model.

Figure 1. Direct validation.

Figure 1. Direct validation.

Figure 2. Cross-validation.

Figure 2. Cross-validation.

Table 1. Parameter estimates.

These figures demonstrate the excellent model agreement.

4 Conclusion

The simultaneous perturbation approach developed by Spall Citation1,Citation2 is a very powerful technique, which allows an approximation of the gradient of the objective function to be computed by effecting simultaneous random perturbations in all the parameters. Therefore, this approach is particularly well-suited to problems involving a relatively large number of design parameters. In this study, a first-order SP algorithm is described and applied to the maximum-likelihood estimation of the kinetic parameters and initial conditions of a bioprocess model from experimental measurements of a few macroscopic components. These applications, as well as previous studies by the authors Citation7, demonstrate the usefulness of the proposed SP-algorithm.

Acknowledgements

The authors are very grateful to Professor J. Wérenne (Université Libre de Bruxelles, Departement of Animal Cell Biotechnology) and Mr M. Cherlet for providing them with the measurement data relating to the CHO-animal cell cultures.

References

  • Spall , J. C. 1992 . Multivariate stochastic approximation using a simultaneous perturbation gradient approximation . IEEE Transactions Automatic Control , 37 : 332 – 341 .
  • Spall , J. C. 2000 . Adaptive stochastic approximation by the simultaneous perturbation method . IEEE Transactions Automatic Control , 45 : 1839 – 1853 .
  • Sadegh , P. 1997 . Constrained optimization via stochastic approximation with a simultaneous perturbation gradient approximation . Automatica , 33 : 889 – 892 .
  • Bastin , G. and Dochain , D. 1990 . On-line Estimation and Adaptive Control of Bioreactors , Amsterdam : Elsevier .
  • Bogaerts , Ph. and Hanus , R. 2000 . Focus on Biotechnology, Vol. IV, Engineering and Manufacturing for Biotechnology , Edited by: Thonart , P h. and Hofman , M. Dordrecht, , The Netherlands : Kluwer Academic Publishers .
  • Bogaerts , Ph. and Vande Wouwer , A. 2001 . “ Systematic generation of identifiable macroscopic reaction schemes ” . In Proceedings IFAC CAB8 Quebec, , Canada
  • Vande Wouwer , A. , Renotte , C. and Remy , M. 2003 . Application of stochastic approximation techniques in neural modelling and control . International Journal of Systems Science , 34 : 851 – 863 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.