Search in:

Inverse Problems in Science and Engineering Volume 23, 2015 - Issue 5

Submit an article Journal homepage

Free access

265

Views

CrossRef citations to date

Altmetric

Listen

Articles

Optimal design of simultaneous source encoding

Eldad HaberDepartment of Earth, Ocean and Atmospheric Sciences, University of British Columbia, Vancouver, Canada.View further author information

Kees van den DoelDepartment of Earth, Ocean and Atmospheric Sciences, University of British Columbia, Vancouver, Canada.Correspondence[email protected]
View further author information

Lior HoreshDepartment of Business Analytics and Mathematical Sciences, IBM TJ Watson Research Center, Yorktown Heights, NY, USA.View further author information

Pages 780-797 | Received 08 Jan 2013, Accepted 22 Sep 2013, Published online: 18 Jul 2014

Cite this article
https://doi.org/10.1080/17415977.2014.934821
CrossMark

In this article

1 Introduction
2 Optimal selection of weights
3 Numerical solution
4 Numerical experiments
5 Summary and conclusions
Additional information
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

A broad range of parameter estimation problems involve the collection of an excessively large number of observations N. Typically, each such observation involves excitation of the domain through injection of energy at some predefined sites and recording of the response of the domain at another set of locations. It has been observed that similar results can often be obtained by considering a far smaller number K of multiple linear superpositions of experiments with $K ≪ N$ . This allows the construction of the solution to the inverse problem in time $O (K)$ instead of $O (N)$ . Given these considerations it should not be necessary to perform all the N experiments but only a much smaller number of K experiments with simultaneous sources in superpositions with certain weights. Devising such procedure would results in a drastic reduction in acquisition time. The question we attempt to rigorously investigate in this work is: what are the optimal weights? We formulate the problem as an optimal experimental design problem and show that by leveraging techniques from this field an answer is readily available. Designing optimal experiments requires some statistical framework and therefore the statistical framework that one chooses to work with plays a major role in the selection of the weights.

Keywords:

helmhotlz
seismic
large scale
design
numeric

1 Introduction

We consider discrete parameter estimation problems with data vectors $d_{j}$ modelled as1 $\begin{matrix} d_{j} = A_{j} m + ϵ_{j} (j = 1, \dots, N), \end{matrix}$ 1 where $A_{j}$ is an $s \times n$ linear observation matrix that corresponds, for example, to the $j$ th configuration of an experiment, $m \in {I R}^{n}$ is the parameter vector to be recovered and $ϵ_{j}$ are zero-mean independent random vectors with covariance matrix $σ^{2} I$ . In addition, we assume that there is a large set of observation matrices $A_{j}$ , which is the case of interest in many geophysical and medical applications such as seismic and electromagnetic (EM) exploration, diffraction and electrical impedance tomography.[Citation1–Citation3]

The linear systems in (Equation11 $\begin{matrix} d_{j} = A_{j} m + ϵ_{j} (j = 1, \dots, N), \end{matrix}$ 1 ) can be aggregated into a single large system by defining2 $\begin{matrix} \hat{A} \equiv (\begin{matrix} A_{1} \\ ⋮ \\ A_{N} \end{matrix}) and \hat{d} \equiv (\begin{matrix} d_{1} \\ ⋮ \\ d_{N} \end{matrix}) . \end{matrix}$ 2 In the well-posed case where ${\hat{A}}^{} ⊤ \hat{A}$ has a stable inverse, the least-squares (maximum likelihood if $ϵ_{j}$ are Gaussian) estimate $\hat{m}$ of $m$ is3 $\begin{matrix} \hat{m} = {({\hat{A}}^{⊤} \hat{A})}^{- 1} {\hat{A}}^{⊤} \hat{d} . \end{matrix}$ 3 When ${\hat{A}}^{} ⊤ \hat{A}$ does not have a stable inverse, a penalized least-squares estimate of the form4 $\begin{matrix} \hat{m} = {({\hat{A}}^{⊤} \hat{A} + λ^{2} M)}^{- 1} {\hat{A}}^{⊤} \hat{d}, \end{matrix}$ 4 is often used. Here $M$ is usually a symmetric positive semi-definite matrix and $λ$ is a positive regularization parameter. This choice will simplify analysis later on, although the proposed approach can be generalized for use of other regularizers of other functional forms.

We are particularly interested in large-scale problems where both the number of model parameters $n$ and the number of observations $N$ are large. For instance, in an electromagnetic or a seismic survey, the number of a such observations, $N$ , can be of the order of $10^{4}$ or $10^{5}$ . For such problems, solution of the system (Equation44 $\begin{matrix} \hat{m} = {({\hat{A}}^{⊤} \hat{A} + λ^{2} M)}^{- 1} {\hat{A}}^{⊤} \hat{d}, \end{matrix}$ 4 ) using direct methods is intractable computationally and instead iterative methods (typically, CGLS or LSQR [Citation4]) are employed. The main computational bottleneck associated with such solvers is the computation of matrix vector products: When $N$ is very large, the computation of $\hat{A}$ times a vector can be prohibitively intensive by itself. We thus need to devise some data reduction methodology to make computations feasible.

For example, return to the well-posed case and assume the matrices $A_{j}$ come from a parameter estimation problem where the observed phenomena can be described by a set of differential equations that are linear w.r.t. the source term. Such operators can be written as $A_{j} = S L^{- 1} Q_{j},$ where $S$ and $Q_{j}$ are sparse selection matrices related to the sensors and the sources, respectively, but $L$ is a discretization of a partial differential equation on a mesh. Such problems arise in seismic and electromagnetic exploration as well as in impedance and diffraction tomography.[Citation1–Citation3] The least-squares estimate of the model $m$ is5 $\begin{matrix} \hat{m} = {(\sum_{j} Q_{j}^{} ⊤ L^{- ⊤} S^{} ⊤ S L^{- 1} Q_{j})}^{- 1} \sum_{j} Q_{j}^{} ⊤ L^{- ⊤} S^{} ⊤ d_{j}, \end{matrix}$ 5 which requires multiple applications of $L^{- 1}$ , thus, making the procedure computationally expensive. Alternatively, a far less-expensive procedure relies upon a recently developed data reduction technique based on the idea of ‘simultaneous sources’.[Citation5–Citation9] Here the estimate is based on a weighted combination of the form $d (w) = \sum w_{j} d_{j}$ . The basic idea is to exploit the savings achieved by simultaneously collecting the data.

For example, if each experiment corresponds to a different source in seismic or electromagnetic exploration, it can be rather expensive both in terms of time and cost to record many different experiments. If, on the other hand, it is possible to ‘shoot’ simultaneously with all the sources, then the recording time can be significantly shortened as a collection of simultaneous sources enables direct measurement of linear combinations of the data. That is, for a fixed $w$ one conducts an experiment that measures $d (w)$ directly. This acquisition strategy is of course far more efficient than the conventional process in which each data vector is measured independently while later on combining them with the weights $w$ .

The linear system corresponding to $d (w)$ is6 $\begin{matrix} d (w) = A (w) m + ϵ (w), \end{matrix}$ 6 where $A (w) = \sum w_{j} A_{j}$ and $ϵ (w) = \sum w_{j} ϵ_{j}$ . We assume the individual source noise levels $ϵ_{j}$ are known. The estimates (Equation33 $\begin{matrix} \hat{m} = {({\hat{A}}^{⊤} \hat{A})}^{- 1} {\hat{A}}^{⊤} \hat{d} . \end{matrix}$ 3 ) and (Equation44 $\begin{matrix} \hat{m} = {({\hat{A}}^{⊤} \hat{A} + λ^{2} M)}^{- 1} {\hat{A}}^{⊤} \hat{d}, \end{matrix}$ 4 ) corresponding to $d (w)$ are7 $\begin{matrix} \hat{m} {(w) = (A (w)}^{⊤} A (w) / {‖ w ‖}^{2})^{- 1} A {(w)}^{⊤} d (w) / {‖ w ‖}^{2} \end{matrix}$ 7 if $A {(w)}^{} ⊤ A (w)$ has a stable inverse, and8 $\begin{matrix} \hat{m} (w) = {(A {(w)}^{⊤} A (w) / {‖ w ‖}^{2} + λ^{2} M)}^{- 1} A {(w)}^{⊤} d (w) / {‖ w ‖}^{2} \end{matrix}$ 8 otherwise. Clearly, both solutions depend on $w$ only through its unit-norm normalization $w / ‖ w ‖$ .

For example, let us return to the well-posed parameter estimation example with $A_{j} = S L^{- 1} Q_{j}$ . The penalized least-squares estimate (Equation88 $\begin{matrix} \hat{m} (w) = {(A {(w)}^{⊤} A (w) / {‖ w ‖}^{2} + λ^{2} M)}^{- 1} A {(w)}^{⊤} d (w) / {‖ w ‖}^{2} \end{matrix}$ 8 ) is $\begin{matrix} \hat{m} & = {[(\sum_{j} w_{j} Q_{j}^{} ⊤) L^{- ⊤} S^{} ⊤ S L^{- ⊤} (\sum_{k} w_{k} Q_{j})]}^{- 1} \\ \times (\sum_{i} w_{i} Q_{i}^{} ⊤) L^{- ⊤} S^{} ⊤ (\sum_{m} w_{m} d_{m}), \end{matrix}$ which requires much fewer applications of $L^{- 1}$ than (Equation55 $\begin{matrix} \hat{m} = {(\sum_{j} Q_{j}^{} ⊤ L^{- ⊤} S^{} ⊤ S L^{- 1} Q_{j})}^{- 1} \sum_{j} Q_{j}^{} ⊤ L^{- ⊤} S^{} ⊤ d_{j}, \end{matrix}$ 5 ). Of course, some information may be lost as $(d_{1}, \dots, d_{N})$ is not, in general, a sufficient statistic for $m$ . We return to this point in Section 2.2 but it should be clear that some information might be sacrificed for the sake of computational or data collection savings.

Although the advantages of using simultaneous sources have been previously explored, the question of selecting appropriate weights $w$ has not received sufficient attention. We are aware of two main conducts that have been utilized so far. In [Citation10], the weights $w_{j}$ are independent zero-mean random variables of variance one and are used to define the Hutchinson randomized trace estimator.[Citation11] More precisely, let $R$ be the residual matrix defined as $R = (\begin{matrix} A_{1} m - d_{1} & \dots & A_{N} m - d_{N} \end{matrix}) .$ Then, $\begin{matrix} {E ‖ A (w) m - d (w) ‖}^{2} & = E (w^{} ⊤ R^{} ⊤ R w) = trace (R^{} ⊤ R) \\ = \sum_{j} ‖ A_{j} m - d_{j} ‖^{2} = {‖ R ‖}_{F}^{2}, \end{matrix}$ where ${‖ \cdot ‖}_{F}$ denotes the Frobenius norm. If $w^{(1)}, \dots, w^{(b)}$ are independent random weight vectors, then by the law of large numbers $\frac{1}{b} \sum_{k = 1}^{b} ‖ A (w^{(k)}) m - d (w^{(k)}) ‖^{2} \to {‖ R ‖}_{F},$ as $b \to \infty$ . Therefore, the finite sum on the left-hand side can be used to approximate ${‖ R ‖}_{F}$ . It is easy to see that the uniform distribution on ${- 1, 1}$ minimizes the variance of the trace estimator (see e.g. [Citation12]). Methods to select appropriate values of $b$ are discussed in [Citation12].

In [Citation13] the Frobenius norm of the residual matrix is replaced by the squared 2-norm and $w$ is then chosen so that ${‖ A (w) m - d (w) ‖}^{2} \approx {‖ R ‖}_{2}^{2} .$ The paper is structured as follows. In Section 2, we derive the mathematical framework. In Section 3, we discuss efficient numerical techniques for the solution of the weight design problem. In Section 4, we demonstrate the effectiveness of our approach by performing numerical experiments and finally, in Section 5 we summarize the study.

2 Optimal selection of weights

In this section, we consider statistical properties of the estimate $\hat{m}$ to select optimal weights $w$ . We also discuss methods to compensate for the information loss and to compare the loss achieved with different choices of weights $w$ .

We begin our discussion with the well-posed problem and its solution (Equation77 $\begin{matrix} \hat{m} {(w) = (A (w)}^{⊤} A (w) / {‖ w ‖}^{2})^{- 1} A {(w)}^{⊤} d (w) / {‖ w ‖}^{2} \end{matrix}$ 7 ). As can be observed, $\hat{m}$ depends only on $w / ‖ w ‖$ ; we may therefore assume that $‖ w ‖ = 1$ . Since $\hat{m}$ is an unbiased estimator and $Var (ϵ (w)) = σ^{2} I$ , it follows that its mean square error ( $MSE$ ) is $\begin{matrix} MSE (\hat{m}) = E {‖ \hat{m} (w) - m ‖}^{2} = σ^{2} trace ({(A {(w)}^{⊤} A (w))}^{- 1}) . \end{matrix}$ To find an optimal $w$ , in the sense of minimizing the $MSE$ , we need to solve the constrained optimization problem9 $\begin{matrix} min_{w} & trace ({(A {(w)}^{⊤} A (w))}^{- 1}) \\ s.t. & ‖ w ‖ = 1 . \end{matrix}$ 9 Before discussing properties of the solutions, we consider the ill-posed problem.

2.1 The ill-posed case

Other than recent numerous exceptions (see [Citation14–Citation19] and references therein), optimal experimental design of ill-posed problems has been somewhat an under-researched topic. In the case of ill-posed problems, the selection of optimal weights for (Equation88 $\begin{matrix} \hat{m} (w) = {(A {(w)}^{⊤} A (w) / {‖ w ‖}^{2} + λ^{2} M)}^{- 1} A {(w)}^{⊤} d (w) / {‖ w ‖}^{2} \end{matrix}$ 8 ) is more difficult because this estimate is biased and its bias depends on the unknown $m$ : $bias (\hat{m}) = - λ^{2} C {(w)}^{- 1} M m$ , where the inverse Fisher matrix is given by $C (w) = A {(w)}^{⊤} A (w) + λ^{2} M$ (again, we may assume $‖ w ‖ = 1$ ). The $MSE$ of $\hat{m}$ is then10 $\begin{matrix} MSE (\hat{m}) & = E ‖ \hat{m} {- m ‖}^{2} = {‖ bias (\hat{m}) ‖}^{2} + trace (Var (\hat{m})) \\ = λ^{4} {‖ C {(w)}^{- 1} M m ‖}^{2} + σ^{2} trace (A (w) C {(w)}^{- 2} A {(w)}^{⊤}) . \end{matrix}$ 10 As in the well-posed case, the goal would be to find weights $w$ that minimize this $MSE$ subject to the constraint $‖ w ‖ = 1$ but this is not possible because $m$ is unknown. We need information on $m$ to control the $MSE$ .

If $m$ is known to be in a convex set $S$ , then we could consider a minmax approach where we minimize the worst $MSE (\hat{m})$ for $m \in S$ . A less pessimistic approach is to minimize an average $MSE (\hat{m})$ over $S$ . Of course, other choices are possible and may be useful as long as their interpretation can be rationalized. In this note, we minimize a weighted average of the $MSE$ . We model $m$ as random and since $\hat{m}$ only depends linearly on $m$ , it is enough to use prior second-order moment conditions. For example, if $E (m) = 0$ and $Var (m) = Σ_{m}$ , then the Bayes risk $R_{FB} (w)$ under squared error loss of the frequentist estimate $\hat{m}$ defined by the fixed $w$ is $\begin{matrix} R_{FB} (w) & = E (MSE (\hat{m})) \\ = λ^{4} trace (C {(w)}^{- 1} M Σ_{m} M C {(w)}^{- 1}) + σ^{2} trace (A (w) C {(w)}^{- 2} A {(w)}^{} ⊤), \end{matrix}$ and we solve the optimization problem11 $\begin{matrix} min_{w} & R_{FB} (w) \\ s.t. & ‖ w ‖ = 1 . \end{matrix}$ 11 Finally, assume the distributions are Gaussian. That is, the conditional distribution of $d (w)$ given $m$ is $N (A (w) m, σ^{2} I)$ and the prior distribution of $m$ is $N (0, τ^{2} M^{- 1})$ , with $τ^{2} = σ^{2} / λ^{2}$ . In this case, (Equation88 $\begin{matrix} \hat{m} (w) = {(A {(w)}^{⊤} A (w) / {‖ w ‖}^{2} + λ^{2} M)}^{- 1} A {(w)}^{⊤} d (w) / {‖ w ‖}^{2} \end{matrix}$ 8 ) is the posterior mean (and a maximum a posteriori or MAP estimator) and the posterior covariance matrix is $Var (m ∣ d (w)) = σ^{2} C {(w)}^{- 1}$ . For each fixed $w$ the Bayes risk of $\hat{m}$ is therefore $R_{B} (w) = σ^{2} trace (C {(w)}^{- 1}) .$ The corresponding optimization problem is12 $\begin{matrix} min_{w} & R_{B} (w) \\ s.t. & ‖ w ‖ = 1 . \end{matrix}$ 12

Remark

So far we have only considered linear/affine reconstructions of the model, which of course do not include many recovering techniques based, for example, on total variation or sparsity control. However, if the reason for choosing one of these nonlinear procedures is the belief that its optimal Bayes risk is smaller than that of the linear reconstruction, then the optimal Bayes risk of the latter provides an upper bound for the optimal Bayes risk of the nonlinear procedure. Furthermore, if the upper bound is not too conservative then minimizing the Bayesian risk for an affine estimator can be a useful guide for other reconstruction techniques.

2.2 Compensating for the information loss

As we have explained, some information may be lost with the reduction to $d (w)$ . If the loss is unacceptable, we may compensate by using more than a single reduction as follows. Let $k$ be the number of independent data reductions and $w^{(ℓ)} = {(w_{1}^{(ℓ)}, \dots, w_{N}^{(ℓ)})}^{} ⊤$ . Define $d_{j}^{(ℓ)} = A_{j} m + ϵ_{j}^{(ℓ)} (j = 1, \dots, N, ℓ = 1, \dots, k),$ where the sequence ${ϵ_{j}^{(ℓ)}}_{j, ℓ}$ is assumed to be i.i.d. $N (0, σ^{2})$ . The $ℓ^{t h}$ data reduction is $d (w^{(ℓ)}) = \sum_{j} w_{j}^{(ℓ)} d_{j}^{(ℓ)}$ . We solve for $m$ using the vector of $k$ reductions $d (W_{k})$ and matrix $A (W_{k})$ defined as $d (W_{k}) = (\begin{matrix} d (w^{(1)}) \\ ⋮ \\ d (w^{(k)}) \end{matrix}) and A (W_{k}) = (\begin{matrix} A (w^{(1)}) \\ ⋮ \\ A (w^{(k)}) \end{matrix}) .$ The associated Bayes risk is $R_{B} (W_{k}) = σ^{2} trace (C {(W_{k})}^{- 1})$ with $C (W_{k}) = A {(W_{k})}^{⊤} A (W_{k}) + λ^{2} M$ . The optimization problem is13 $\begin{matrix} min_{W_{k}} & R_{B} (W_{k}) \\ s.t. & ‖ w^{(1)} ‖ = 1, \dots, ‖ w^{(k)} ‖ = 1 . \end{matrix}$ 13 There are some particular cases that may make this optimization easier and provide upper bounds for the optimal solution of (Equation1313 $\begin{matrix} min_{W_{k}} & R_{B} (W_{k}) \\ s.t. & ‖ w^{(1)} ‖ = 1, \dots, ‖ w^{(k)} ‖ = 1 . \end{matrix}$ 13 ). These bounds can be useful when we study the loss as a function of $k$ . One particular case consists of using prior information about the experiments to classify the matrices $A_{j}$ into $k$ groups whose members are similar on a realistic set of models $m$ . For example $d (w^{(1)})$ may include only $A_{1}$ and $A_{2}$ while $d (w^{(2)})$ uses only $A_{3}$ , $A_{4}$ and $A_{5}$ , etc. A second simplification consists of assuming $w^{(1)} = \dots = w^{(k)}$ so that there is only one constraint but $k$ independent experiments. The final and simplest approach is to find the optimal $w^{*}$ in (Equation1212 $\begin{matrix} min_{w} & R_{B} (w) \\ s.t. & ‖ w ‖ = 1 . \end{matrix}$ 12 ) and then repeat the experiment $k$ times. This leads to the risk14 $\begin{matrix} R_{B} (w^{*}, k) = σ^{2} trace [{(k A {(w^{*})}^{⊤} A (w^{*}) + λ^{2} M)}^{- 1}] . \end{matrix}$ 14

2.3 Quantifying the information loss

Although in general the optimization problem does not have a unique minimum, a local minimum may still provide a sufficient reduction of the risk achieved with the reduced data $d (w)$ . This does not mean, however, that the data reduction is useful for much information may be lost. It is therefore important to assess whether the computational savings obtained through the data reduction justifies the information loss.

Figure 1. The loss as a function of the number of vectors.

To assess the information loss of the data reduction, we compare its associated risk to that of the complete data. The Bayes risk of the complete (i.e. exhaustive acquisition) problem is $R_{0} = trace ({({\hat{A}}^{⊤} \hat{A} + λ^{2} M)}^{- 1}),$ while the risk associated with the reduced data $d (w)$ is15 $\begin{matrix} R_{B} (w) = trace ({(A {(w)}^{⊤} A (w) + λ^{2} M)}^{- 1}) . \end{matrix}$ 15 We quantify the information loss using the risk ratio16 $\begin{matrix} Loss (w) = \frac{R_{B} (w)}{R_{0}} . \end{matrix}$ 16 This loss indicates the degradation of the average recovery when using the reduction $d (w)$ instead of the full data. If the loss is much greater than $1$ then using a single set of weights may lead to significant deterioration in the recovery. In this case one may want to use a number of weights as explained in Section 2.2. The loss is then $Loss (W_{k}) = \frac{R_{B} (W_{k})}{R_{0}} .$ To determine an appropriate value of $k$ , we may plot the loss as a function of $k$ . The idea is to balance the trade-off between the cost of the experiment and the information loss. The following example illustrates this point.

Example 1

We generate $N = 50$ random matrices $A_{j}$ of sizes $100 \times 200$ and use $k$ random unit vectors (not optimal) $w^{(1)}, \dots, w^{(k)}$ . Figure shows a plot of $R_{B} (W_{k})$ as a function of $k$ . The curve has a classical L-shape and provides an upper bound for the optimal risks $R_{B} (W_{k}^{*})$ . The figure shows that using less than $10$ experiments will result in significant loss of information but using more than $30$ may be a waste of resources. Thus, the loss curve is important if we would like to design an effective and yet efficient experiment.

2.4 Numerical properties of the design function

We now study properties of the weights selected in the fully Bayesian framework; that is, solutions of (Equation1212 $\begin{matrix} min_{w} & R_{B} (w) \\ s.t. & ‖ w ‖ = 1 . \end{matrix}$ 12 ). First, note that neither the constraint $‖ w ‖ = 1$ nor the objective functions $R_{B} (w)$ and $R_{B} (W_{k})$ are convex and consequently there is no guarantee of a unique solution. In fact, if $w_{*}$ is a solution of (Equation1212 $\begin{matrix} min_{w} & R_{B} (w) \\ s.t. & ‖ w ‖ = 1 . \end{matrix}$ 12 ), then so is ${\tilde{w}}_{*} = - w_{*}$ . And any permutation of an optimal sequence $w_{*}^{(1)}, \dots, w_{*}^{(k)}$ is a solution of (Equation1313 $\begin{matrix} min_{W_{k}} & R_{B} (W_{k}) \\ s.t. & ‖ w^{(1)} ‖ = 1, \dots, ‖ w^{(k)} ‖ = 1 . \end{matrix}$ 13 ).

Example 2

Consider the following simple $2 \times 2$ well-posed case: $A_{1} = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}), A_{2} = (\begin{matrix} 1 & 0 \\ 0 & - 1 \end{matrix}) .$ It is easy to see that for $‖ w ‖ = 1$ , $trace ({(A {(w)}^{⊤} A (w))}^{- 1}) = \frac{2}{1 - 4 w_{1}^{2} w_{2}^{2}} = \frac{1}{2 {(w_{1}^{2} - 1 / 2)}^{2}},$ and the optimal solutions of (Equation99 $\begin{matrix} min_{w} & trace ({(A {(w)}^{⊤} A (w))}^{- 1}) \\ s.t. & ‖ w ‖ = 1 . \end{matrix}$ 9 ) are $w^{*} = \pm (1, 0)$ . That is, the optimal solution does not use the information provided by $d_{2}$ . To see how much information is lost, we compare the risk of the full least-squares solution and the one based on $d (w^{*})$ . The two optimal weights lead to the same estimate $\hat{m} (w^{*}) = d_{1}$ with $MSE (w^{*}) = σ^{2} trace (I) = 2 σ^{2}$ . For the full least-squares solution, we have ${\hat{A}}^{} ⊤ \hat{A} = 2 I$ and thus ${\hat{m}}_{ls} = \frac{1}{2} (\begin{matrix} d_{1, 1} + d_{2, 1} \\ d_{1, 2} - d_{2, 2} \end{matrix}),$ with $MSE (w_{ls}^{*}) = σ^{2}$ . Hence, in this case the information lost by the reduction to $d (w^{*})$ leads to a $MSE$ that is twice as large as that of the full least-squares estimate. To compensate, we may conduct a single replication with the optimal $w^{*}$ to match the $MSE$ obtained without the data reduction.

Example 3

Next, we consider the problem (Equation1212 $\begin{matrix} min_{w} & R_{B} (w) \\ s.t. & ‖ w ‖ = 1 . \end{matrix}$ 12 ) for a the case $N = 2$ and $N = 3$ . The matrices $A_{i}$ are random $7 \times 10$ matrices with i.i.d. $N (0, 1)$ entries, $M = I$ and $σ = 1 = λ$ . We parameterize $w$ in terms of the angle $0 \leq ϕ \leq 2 π$ by $w = (cos ϕ sin ϕ)$ for $N = 2$ and by spherical coordinates for $N = 3$ . In Figure , the value of the objective function for two random realizations of the matrices $A_{1}$ and $A_{2}$ is plotted, and one realization for $N = 3$ . The problem has multiple minima for the first instance, but a unique minimum for the second instance, as the two minima are equivalent. For the case $N = 3$ there are many more local minima at various places on the sphere. Local minima may not be too detrimental to our final goal. The value of $MSE$ is not as important as long as we achieve a reduction of the risk. For the same reason, even if we do not have a global solution we may still find weights $w$ that substantially decrease the risk one would have achieved by using other more naive choices.

Figure 2. The objective function as a function of $ϕ$ for the weight $w = (cos ϕ sin ϕ)$ for two realizations of the matrices $A_{1}$ and $A_{2}$ from example $3$ . We also show the case $N = 3$ .

3 Numerical solution

3.1 Direct approach

To solve the problem (Equation1212 $\begin{matrix} min_{w} & R_{B} (w) \\ s.t. & ‖ w ‖ = 1 . \end{matrix}$ 12 ), we enforce the constraints by parameterizing $w$ in terms of $N - 1$ spherical coordinate angles $ϕ_{i}$ on the unit $N$ -sphere:17 $\begin{matrix} w_{1} = cos (ϕ_{1}) \\ w_{2} = sin (ϕ_{1}) cos (ϕ_{2}) \\ w_{3} = sin (ϕ_{1}) sin (ϕ_{2}) cos (ϕ_{3}) \\ ⋮ \\ w_{N - 1} = sin (ϕ_{1}) \dots sin (ϕ_{N - 2}) cos (ϕ_{N - 1}) \\ w_{N} = sin (ϕ_{1}) \dots sin (ϕ_{N - 2}) sin (ϕ_{N - 1}) . \end{matrix}$ 17 We then use the BFGS method to solve the minimization problem iteratively. This requires the computation of the derivatives of the objective function. Rewriting the objective function as $J (w) = trace ({(A {(w)}^{⊤} A (w) + M)}^{- 1}) = \sum_{j} e_{j}^{⊤} {(A {(w)}^{⊤} A (w) + M)}^{- 1} e_{j}$ and setting $z_{j} = {(A {(w)}^{⊤} A (w) + M)}^{- 1} e_{j} \leftrightarrow (A {(w)}^{⊤} A (w) + M) z_{j} = e_{j}$ we have that ${\vec{\nabla}}_{w} J (w) = {(\sum_{j} e_{j}^{⊤} {\vec{\nabla}}_{w} z_{j} (w))}^{⊤}$ To evaluate the derivative of $z_{j}$ we use implicit differentiation $0 = {\vec{\nabla}}_{w} ((A {(w)}^{⊤} A (w) + M) z_{j}) = {\vec{\nabla}}_{w} (A {(w)}^{⊤} A (w) z_{j}^{fix}) + (A {(w)}^{⊤} A (w) + M) {\vec{\nabla}}_{w} z_{j}$ We further define the matrices $\begin{matrix} G_{F} (v) & : = {\vec{\nabla}}_{w} (A (w) v) = (\begin{matrix} A_{1} v & \dots & A_{N} v \end{matrix}) \\ G_{T} (u) & : = {\vec{\nabla}}_{w} (A {(w)}^{⊤} u) = (\begin{matrix} A_{1}^{⊤} u & \dots & A_{N}^{⊤} u \end{matrix}) \end{matrix}$ which leads to the calculation $G_{j} (w) : = {\vec{\nabla}}_{w} (A {(w)}^{⊤} A (w) z_{j}) = A {(w)}^{⊤} G_{F} (z_{j}) + G_{T} (A (w) z_{j})$ and finally18 $\begin{matrix} {\vec{\nabla}}_{w} J = - \sum_{j} G_{j}^{⊤} (w) {(A {(w)}^{⊤} A (w) + M)}^{- 1} e_{j} . \end{matrix}$ 18 The gradient w.r.t. the angles $ϕ$ is then obtained by multiplication by the Jacobian of the transformation (Equation1717 $\begin{matrix} w_{1} = cos (ϕ_{1}) \\ w_{2} = sin (ϕ_{1}) cos (ϕ_{2}) \\ w_{3} = sin (ϕ_{1}) sin (ϕ_{2}) cos (ϕ_{3}) \\ ⋮ \\ w_{N - 1} = sin (ϕ_{1}) \dots sin (ϕ_{N - 2}) cos (ϕ_{N - 1}) \\ w_{N} = sin (ϕ_{1}) \dots sin (ϕ_{N - 2}) sin (ϕ_{N - 1}) . \end{matrix}$ 17 ). We have used the BFGS [Citation20] method to solve each optimization problem and we fix the termination tolerance of each problem, demanding that the norm of the gradient is reduced to $10^{- 2}$ of its initial value. To address the problem of multiple minima, we repeat the optimization procedure for a number of random initial guesses for $w$ and select the solution with the smallest value of the objective function. The method to solve the problem (Equation1313 $\begin{matrix} min_{W_{k}} & R_{B} (W_{k}) \\ s.t. & ‖ w^{(1)} ‖ = 1, \dots, ‖ w^{(k)} ‖ = 1 . \end{matrix}$ 13 ) which involves multiple weights is completely analogous.

The techniques introduced in the previous sections work well for small-scale problems. However for large-scale problems, when $A_{j}$ is large the computation of the objective function and its derivatives can be rather expensive. Here, we can use a stochastic trace estimation [Citation11] and substitute the problem with19a $\begin{matrix} min R_{F B} (w; v) & = E \end{matrix}$ 19a v[ 2 v&#Xtop;(C(w)-1M mMC(w)-1)v

+ v&#Xtop; (A(w)&#Xtop;C(w)-2A(w)) v ]

s.t. w1 and19b $\begin{matrix} min R_{F B} (w) & = σ^{2} E_{v} [v^{⊤} (A {(w)}^{⊤} A (w) + σ^{2} M) v] \\ s . t . & ‖ w ‖ \leq 1 \end{matrix}$ 19b where the expectation value over random vectors $v$ is approximated by sampling.

3.2 Heuristic weight selection

We can interpret minimization of (Equation1515 $\begin{matrix} R_{B} (w) = trace ({(A {(w)}^{⊤} A (w) + λ^{2} M)}^{- 1}) . \end{matrix}$ 15 ) as maximizing the operator $A {(w)}^{⊤} A (w)$ . However, in practice it only needs to be maximized over a set of ‘relevant’ models $m$ . Let us now make the bold step of maximizing over a single reference model $m_{0}$ , which can be, for example, a constant background model or a more sophisticated prior. (Note that we cannot use the data to obtain $m_{0}$ as we consider experimental design here.) We now obtain $W = \underset{W}{arg max} m_{0}^{⊤} A {(W)}^{⊤} A (W) m_{0} .$ Now $A (W) m_{0}$ is just the reduced data that would result from the model $m_{0}$ . Let us define and compute the $m_{0}$ data matrix $\tilde{d}$ as in (Equation22 $\begin{matrix} \hat{A} \equiv (\begin{matrix} A_{1} \\ ⋮ \\ A_{N} \end{matrix}) and \hat{d} \equiv (\begin{matrix} d_{1} \\ ⋮ \\ d_{N} \end{matrix}) . \end{matrix}$ 2 ) but with $d_{i}$ taken to be a row of $s$ data (so $\tilde{d}$ is $N \times s$ ) and denote by $W$ the $k \times N$ matrix of weights (with unit norm rows). We get $W = \underset{W}{arg max} trace {\tilde{d}}^{⊤} W^{T} W \tilde{d}$ Let us now perform a singular value decomposition (SVD) upon $\tilde{d} = U Σ S^{⊤}$ . We obtain $W = \underset{W}{arg max} {‖ W U Σ ‖}_{2}^{2}$ and the solution for $W$ (under appropriate constraints) is the sub matrix of $U^{⊤}$ corresponding to the truncated SVD, which is equivalent to obtaining the first $k$ principal components from the data matrix $\tilde{d}$ . For large-scale problems, the first $k$ left eigen vectors and the corresponding eigen values can be estimated effectively through Lanczos iteration. In the following sections, we shall refer to this heuristic choice for $W$ as the ‘SVD’ choice. This heuristic approach works almost as well as the more optimal $W$ in the numerical examples below while avoiding the intricate and expensive optimization process associated with the optimal weights problem. A similar approach using the SVD for faster data processing rather than experimental design was proposed in [Citation21].

4 Numerical experiments

In this section, we test the methods developed on two-dimensional seismic tomography problems. First, we consider a simple toy model on a square domain to test and second we consider a more complicated reconstruction based on the Marmousi model.[Citation22]

We consider a rectangular domain $Ω$ with the top boundary $\partial Ω_{1}$ representing a ground-air interface while the remaining boundary $\partial Ω_{2}$ represents an artificial boundary of the model.

The acoustic pressure field $u$ is determined by the Helmholtz equation with appropriate boundary conditions:20a $\begin{matrix} (\nabla \cdot \nabla + ω^{2} m) u & = q in Ω \end{matrix}$ 20a 20b $\begin{matrix} u & = 0 on \partial Ω_{1} \end{matrix}$ 20b 20c $\begin{matrix} (\partial_{\vec{n}} + i ω \sqrt{m}) u & = 0 on \partial Ω_{2} \end{matrix}$ 20c where $m (x)$ is the slowness, i.e. $m = 1 / c^{2}$ with $c$ the propagation velocity and $q (x)$ represents sources of angular frequency $ω$ . The Sommerfeld boundary condition (Equation20c20c $\begin{matrix} (\partial_{\vec{n}} + i ω \sqrt{m}) u & = 0 on \partial Ω_{2} \end{matrix}$ 20c ) prevents artificial reflections from the synthetic domain boundary $\partial Ω_{2}$ . We discretize (20) on a rectangular grid using a five-point stencil for the Laplacian which is sufficient for numerical experiments and obtain $\begin{matrix} A^{ω} u = (- L^{ω} (m) + ω^{2} diag (m)) u = q, \end{matrix}$ where the Laplace operator $L^{ω}$ depends upon $m$ and $ω$ through the Sommerfeld boundary condition. Note that $u$ is a complex-valued vector.

Figure 3. Reconstructions with their corresponding relative error $ϵ_{r} = \frac{‖ m - m_{t r u e} ‖_{2}}{‖ m_{t r u e} |_{2}}$ .

Figure 3. Reconstructions with their corresponding relative error ϵr=‖m-mtrue‖2‖mtrue|2.

Figure 4. Reconstructions with a single combined source using various weight choices (left). On the right, the fields generated by the sources are displayed.

The inverse problem is to determine $m$ from measurements of $u$ for several given sources $q_{i}$ , where $i = 1, \dots N$ labels the experiments which can involve several frequencies. We assume further that all experiments measure $u$ at the same locations which are described by the linear sampling operator $S$ , i.e. we measure $d_{i} = S u_{i}$ . The predicted data for a given slowness model $m$ is $F_{i} (m) = S A^{- 1} q_{i}$ (suppressing implicit $ω$ dependence) and the least-squares error is21 $\begin{matrix} ϕ_{e} (m) = \frac{1}{2} \sum_{i = 1}^{N} {(F_{i} - d_{i})}^{†} (F_{i} - d_{i}) . \end{matrix}$ 21 We add a Tikhonov regularization term to $ϕ_{e}$ and solve the inverse problem as22 $\begin{matrix} m^{*} = \underset{m}{arg min} ϕ_{e} + \frac{1}{2} λ^{2} {‖ m ‖}_{Δ}^{2} \end{matrix}$ 22 where ${‖ \cdot ‖}_{Δ}$ denotes the Laplacian weighted $ℓ_{2}$ norm.

We can apply the formalism developed above and consider instead of (Equation2121 $\begin{matrix} ϕ_{e} (m) = \frac{1}{2} \sum_{i = 1}^{N} {(F_{i} - d_{i})}^{†} (F_{i} - d_{i}) . \end{matrix}$ 21 )23 $\begin{matrix} {\hat{ϕ}}_{e} (m) = \frac{1}{2} \sum_{k = 1}^{K} {(\sum_{i = 1}^{N} w_{i}^{(k)} (F_{i} - d_{i}))}^{†} (\sum_{j = 1}^{N} w_{j}^{(k)} (F_{j} - d_{j})) \end{matrix}$ 23 for various choices of weights. The Bayes risk function (Equation1313 $\begin{matrix} min_{W_{k}} & R_{B} (W_{k}) \\ s.t. & ‖ w^{(1)} ‖ = 1, \dots, ‖ w^{(k)} ‖ = 1 . \end{matrix}$ 13 ) is defined by linearization around a reference model $m_{0}$ . Because we are dealing with complex variables, the weight vectors are now complex valued and satisfy $w^{†} w = 1$ and there is an additional complex phase factor $e^{i ρ_{k}}$ for each component in (Equation1717 $\begin{matrix} w_{1} = cos (ϕ_{1}) \\ w_{2} = sin (ϕ_{1}) cos (ϕ_{2}) \\ w_{3} = sin (ϕ_{1}) sin (ϕ_{2}) cos (ϕ_{3}) \\ ⋮ \\ w_{N - 1} = sin (ϕ_{1}) \dots sin (ϕ_{N - 2}) cos (ϕ_{N - 1}) \\ w_{N} = sin (ϕ_{1}) \dots sin (ϕ_{N - 2}) sin (ϕ_{N - 1}) . \end{matrix}$ 17 ).

4.1 A simple toy model

For the first problem, we consider we take $Ω$ to be the square domain ${[- 1 1]}^{2}$ which we discretize by a uniform $42^{2}$ grid. Absorbing boundary conditions are assumed except at the top where we impose Dirichlet boundary conditions. Sources are placed on the left boundary and $20$ uniformly spaced detectors are placed on the right. In Figure (a), we depict the synthetic velocity model which we use to compute simulated data to which we add $2 %$ Gaussian noise. Sources of $f = 0.5$ (i.e. one wavelength in domain) are placed on the left in two configurations. First we consider $N = 2$ experiments with a source at $1 / 3$ depth and at $2 / 3$ depth. Second one source is placed in the middle. The inversion is performed as described in Section 3 and the regularization parameter $λ$ was chosen by the Morozov discrepancy principle.[Citation23] We solved the optimization problem (Equation2222 $\begin{matrix} m^{*} = \underset{m}{arg min} ϕ_{e} + \frac{1}{2} λ^{2} {‖ m ‖}_{Δ}^{2} \end{matrix}$ 22 ) using limited memory BFGS. We depict the resulting reconstructions in Figure (b) and (c) along with a relative error measure defined as follows:24 $\begin{matrix} ϵ_{r} = \frac{‖ m - m_{t r u e} ‖_{2}}{‖ m_{t r u e} ‖_{2}} \end{matrix}$ 24 It is clear that for $N = 2$ we capture the basic structure of the model, whereas $N = 1$ is insufficient.

Figure 5. Reconstructions of the Marmousi velocity model using only the low frequency $f = 0.2$ data with all and with $2$ simultaneous sources. We also indicate the relative reconstruction error $ϵ_{r} = \frac{‖ m_{r e c} - m_{t r u e} ‖_{2}}{m_{t r u e} ‖_{2}}$ .

Figure 6. Fields for the $2$ simultaneous sources for the case $f = 0.2$ using the various weight selection methods. Notice how the good weights cover the domain more or less uniformly.

Figure 7. Reconstructions of the Marmousi velocity model using the protocol described. We compare the final result using all the data, and a simultaneous sources protocol with optimal, SVD and random weights.

Next we consider using a single simultaneous source by superposition of the two sources with a unit norm weight vector. In Figure , we show reconstructions using various methods for selection of the weights (left column). The right column shows the absolute value of the pressure field $u$ generated by the corresponding simultaneous source. We computed the optimal weight vector as described in Section 3 using $32$ different random initial guesses for $w$ , and selecting the one with lowest risk. Second, we used the method described in Section 3.2 and finally we generated three random phase weight vectors. It is evident that the optimal weights yield the best reconstruction both visually and in terms of relative error as defined in (Equation2424 $\begin{matrix} ϵ_{r} = \frac{‖ m - m_{t r u e} ‖_{2}}{‖ m_{t r u e} ‖_{2}} \end{matrix}$ 24 ). These optimal results were followed closely by those of the SVD method 3.2. The random weights offered a reasonable reconstruction in (g) but not in (e) and (i). By inspecting the fields in the rightmost column seems that the optimal choice has selected the source combination to provide the best possible domain coverage. The source of the SVD-based approach did so to a lesser extent and the random choices essentially cover the domain randomly, as expected.

4.2 Marmousi model

As a second numerical experiment, we consider the true velocity model depicted in Figure (f) of the Marmousi model. The domain is 3 km by 10 km and velocity is given in km $s^{- 1}$ units. We use a $208 \times 60$ grid for the discretization. Forty emitters are placed at uniform distances on the surface and $39$ receivers are placed in between the emitters. We generate data for five distinct frequencies $f = 0.2, 1.5, 3, 4.5, 6$ (in Hz), with a total of $5 \times 40 = 200$ experiments. We note that the low frequencies are needed to avoid local minima, but are not quite realistic. We added $3 %$ Gaussian noise to the synthetic data. Optimal weights were precomputed before the inversion by linearization around a constant $m_{0} = 2$ km $s^{- 1}$ background model which was also used for the SVD method for selection weights.

First of all we present the reconstructions for a single frequency of $0.2$ Hz using all $40$ experiments and with $K = 2$ weight vectors, each a combination of the $40$ sources. For the latter we show the results for the optimal weights, SVD-based weight selection, and random phase weights in two realizations. It is clear from Figure that the random weight selection performs much worse that the other methods. In this case the SVD reconstruction is even better than using the optimal weights.

In Figure , we depict the resulting pressure field from the two simultaneous sources. We observe again that the optimal and SVD weights result in a more or less uniform and balanced coverage of the domain whereas the random choices do no share this virtue.

To obtain more accurate reconstructions, higher frequencies must be included. Because of the nonconvexity of the problem we perform a frequency continuation by which we first solve (to full convergence) for the lowest frequency, then use the result as initial guess for the next stage where we add an additional frequency, etc. Because accuracy of the reconstruction increases we can expect to require more simultaneous sources as well as more and more frequencies are included. In this example, we use $K = 2, 6, 12, 20, 30,$ simultaneous sources corresponding to frequencies $f = 0.2, 1.5, 3, 4.5, 6$ , again for the aforementioned three methods for weights selection. In Figure , we depict the resulted reconstructions. Remarkably, the reconstructions with the SVD or optimal weights are no worse visually than the reconstruction using all the data, whereas the random weights do not offer comparable results. We note that the advantage of using simultaneous sources decreases with higher frequencies.

5 Summary and conclusions

We have introduced a novel methodology for optimal experimental design of acquisitions involving simultaneous sources. Based on an optimal Bayesian risk statistical framework and the assumption of a linear inverse problem we have formulated an optimality criterion for selection of the weights in which individual observations should be combined. We derived a heuristic guess for the optimal weights based upon the SVD and we performed numerical experiments to test the efficacy of the method. In all cases, it was found that an informed choice of the weights generally outperforms a random weight choice. This is because our statistical framework incorporates information about the domain under consideration and the experimental setup.

Even though for most realistic applications the numerical models lead to nonlinear inverse problems, both weight selection based on a linearized version of the model was still found to perform quite well. Development of computationally viable optimal simultaneous source experimental design that accounts for the nonlinear nature of the observation model is a subject for future work.

In this study, Gaussian assumptions were taken for the noise model and Tikhonov regularization was considered for the ill-posed case. These choices are by no mean exclusive, and in principle, derivation of optimal experimental design for simultaneous sources can be performed for other alternative noise models as well as for other forms of regularization.

So far incorporation of constraints was not considered (other than such that can be introduced in the form of exact penalty). Future research would be to generalize the proposed formulation to account for such.

Additional information

Funding

This work was supported in part by IBM Research and MITACS Open Collaborative Research project for Design in Inversion http://ocrdesign.wix.com/home.

References

Arridge SR. Topical review: optical tomography in medical imaging. Inverse Probl. 1999;15:41–93.
Web of Science ®Google Scholar
Casanova R, Silva A, Borges AR. A quantitative algorithm for parameter estimation in magnetic induction tomography. Meas. Sci. Technol. 2004;15:1412–1419.
Web of Science ®Google Scholar
Dorn O. A shape reconstruction method for electromagnetic tomography using adjoint fields and level sets. Inverse Probl. 2000;16:1119–1156.
Web of Science ®Google Scholar
Hansen PC. Rank-deficient and discrete ill-posed problems. Philadelphia: SIAM; 1997.
Google Scholar
Beasley CJ. A new look at marine simultaneous sources. Leading Edge. 2008;27:914–917.
Google Scholar
Romero LA, Ghiglia DC, Ober CC, Morton SA. Phase encoding of shot records in prestack migration. Geophysics. 2000;65:426–436.
Web of Science ®Google Scholar
Herrmann FJ, Erlangga YA, Lin TTY. Compressive simultaneous full-waveform simulation. Geophysics. 2009;74:A35–A40.
Web of Science ®Google Scholar
Morton SA, Ober CC. Faster shot-record depth migrations using phase encoding. Vol. 17, SEG technical program expanded abstracts. New Orleans, LA: SEG; 1998. p. 1131–1134.
Google Scholar
Romberg J, Neelamani R, Krohn C, Krebs J, Deffenbaugh M, Anderson J. Efficient seismic forward modeling and acquisition using simultaneous random sources and sparsity. Geophysics. 2010;75:WB15–WB27.
Web of Science ®Google Scholar
Haber E, Chung M, Herrmann F. Solving PDE constrained optimization with multiple right hand sides. SIAM J. Optim. 2012;22:739–757.
Web of Science ®Google Scholar
Hutchinson MF. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. J. Commun. Statist. Simul. 1990;19:433–450.
Web of Science ®Google Scholar
Tenorio L, Andersson F, de Hoop M, Ma P. Data analysis tools for uncertainty quantification of inverse problems inverse problems. Inverse Probl. 2011;27:045001.
Web of Science ®Google Scholar
Symes B. Source synthesis for waveform inversion. Denver: SEG Expanded Abstracts; 2010 p. 1018–1022.
Google Scholar
Haber E, Horesh L, Tenorio L. Numerical methods for experimental design of large-scale linear ill-posed inverse problems. Inverse Probl. 2008;24:055012.
Web of Science ®Google Scholar
Coles DA, Morgan FD. A method of fast, sequential experimental design for linearized geophysical inverse problems. Geophys. J. Int. 2009;178:145–158.
Web of Science ®Google Scholar
Horesh L, Haber E, Tenorio L. Book series on computational methods for large-scale inverse problems and quantification of uncertainty. Volume optimal experimental design for the large-scale nonlinear ill-posed problem of impedance imaging. Wiley; 2009.
Google Scholar
Maurer H, Curtis A, Boerner DE. Recent advances in optimized geophysical survey design. Geophysics. 2010;75:75A177–75A194.
Web of Science ®Google Scholar
Haber E, Horesh L, Tenorio L. Numerical methods for the design of large-scale nonlinear discrete ill-posed inverse problems. Inverse Probl. 2010;26:025002.
Web of Science ®Google Scholar
Lahmer T. Optimal experimental design for nonlinear ill-posed problems applied to gravity dams. Inverse Probl. 2011;27:125005.
Web of Science ®Google Scholar
Nocedal J, Wright S. Numerical optimization. New York (NY): Springer; 1999.
Google Scholar
Habashy TM, Abubakar A, Belani A, Pan G, Schlumberger. Full-waveform seismic inversion using the source-receiver compression approach. In: 2010 SEG Annual Meeting; 2010 October 17–22; Vol. 3353; Denver, Colorado, USA: Society of Exploration Geophysicists; 2010.
Google Scholar
Bourgeois A, Bourget M, Lailly P, Poulet M, Ricarte P, Versteeg R. The Marmousi experience. Vol. 5–9, Chapter Marmousi data and model. Copenhagen: EAGE; 1991.
Google Scholar
Morozov VA. The discrepancy principle for solving operator equations by the regularization method. Zh. Vychisl. Mat. Mat. Fiz. 1968;8:295–309.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Optimal design of simultaneous source encoding

Abstract

1 Introduction