265
Views
5
CrossRef citations to date
0
Altmetric
Articles

Optimal design of simultaneous source encoding

, &
Pages 780-797 | Received 08 Jan 2013, Accepted 22 Sep 2013, Published online: 18 Jul 2014

Abstract

A broad range of parameter estimation problems involve the collection of an excessively large number of observations N. Typically, each such observation involves excitation of the domain through injection of energy at some predefined sites and recording of the response of the domain at another set of locations. It has been observed that similar results can often be obtained by considering a far smaller number K of multiple linear superpositions of experiments with KN. This allows the construction of the solution to the inverse problem in time O(K) instead of O(N). Given these considerations it should not be necessary to perform all the N experiments but only a much smaller number of K experiments with simultaneous sources in superpositions with certain weights. Devising such procedure would results in a drastic reduction in acquisition time. The question we attempt to rigorously investigate in this work is: what are the optimal weights? We formulate the problem as an optimal experimental design problem and show that by leveraging techniques from this field an answer is readily available. Designing optimal experiments requires some statistical framework and therefore the statistical framework that one chooses to work with plays a major role in the selection of the weights.

1 Introduction

We consider discrete parameter estimation problems with data vectors dj modelled as1 dj=Ajm+ϵj(j=1,,N),1 where Aj is an s×n linear observation matrix that corresponds, for example, to the jth configuration of an experiment, mIRn is the parameter vector to be recovered and ϵj are zero-mean independent random vectors with covariance matrix σ2I. In addition, we assume that there is a large set of observation matrices Aj, which is the case of interest in many geophysical and medical applications such as seismic and electromagnetic (EM) exploration, diffraction and electrical impedance tomography.[Citation1Citation3]

The linear systems in (Equation1) can be aggregated into a single large system by defining2 A^A1ANandd^d1dN.2 In the well-posed case where A^A^ has a stable inverse, the least-squares (maximum likelihood if ϵj are Gaussian) estimate m^ of m is3 m^=(A^A^)-1A^d^.3 When A^A^ does not have a stable inverse, a penalized least-squares estimate of the form4 m^=(A^A^+λ2M)-1A^d^,4 is often used. Here M is usually a symmetric positive semi-definite matrix and λ is a positive regularization parameter. This choice will simplify analysis later on, although the proposed approach can be generalized for use of other regularizers of other functional forms.

We are particularly interested in large-scale problems where both the number of model parameters n and the number of observations N are large. For instance, in an electromagnetic or a seismic survey, the number of a such observations, N, can be of the order of 104 or 105. For such problems, solution of the system (Equation4) using direct methods is intractable computationally and instead iterative methods (typically, CGLS or LSQR [Citation4]) are employed. The main computational bottleneck associated with such solvers is the computation of matrix vector products: When N is very large, the computation of A^ times a vector can be prohibitively intensive by itself. We thus need to devise some data reduction methodology to make computations feasible.

For example, return to the well-posed case and assume the matrices Aj come from a parameter estimation problem where the observed phenomena can be described by a set of differential equations that are linear w.r.t. the source term. Such operators can be written asAj=SL-1Qj,where S and Qj are sparse selection matrices related to the sensors and the sources, respectively, but L is a discretization of a partial differential equation on a mesh. Such problems arise in seismic and electromagnetic exploration as well as in impedance and diffraction tomography.[Citation1Citation3] The least-squares estimate of the model m is5 m^=jQjL-SSL-1Qj-1jQjL-Sdj,5 which requires multiple applications of L-1, thus, making the procedure computationally expensive. Alternatively, a far less-expensive procedure relies upon a recently developed data reduction technique based on the idea of ‘simultaneous sources’.[Citation5Citation9] Here the estimate is based on a weighted combination of the form d(w)=wjdj. The basic idea is to exploit the savings achieved by simultaneously collecting the data.

For example, if each experiment corresponds to a different source in seismic or electromagnetic exploration, it can be rather expensive both in terms of time and cost to record many different experiments. If, on the other hand, it is possible to ‘shoot’ simultaneously with all the sources, then the recording time can be significantly shortened as a collection of simultaneous sources enables direct measurement of linear combinations of the data. That is, for a fixed w one conducts an experiment that measures d(w) directly. This acquisition strategy is of course far more efficient than the conventional process in which each data vector is measured independently while later on combining them with the weights w.

The linear system corresponding to d(w) is6 d(w)=A(w)m+ϵ(w),6 where A(w)=wjAj and ϵ(w)=wjϵj. We assume the individual source noise levels ϵj are known. The estimates (Equation3) and (Equation4) corresponding to d(w) are7 m^(w)=(A(w)A(w)/w2)-1A(w)d(w)/w27 if A(w)A(w) has a stable inverse, and8 m^(w)=A(w)A(w)/w2+λ2M-1A(w)d(w)/w28 otherwise. Clearly, both solutions depend on w only through its unit-norm normalization w/w.

For example, let us return to the well-posed parameter estimation example with Aj=SL-1Qj. The penalized least-squares estimate (Equation8) ism^=jwjQjL-SSL-kwkQj-1×iwiQiL-Smwmdm,which requires much fewer applications of L-1 than (Equation5). Of course, some information may be lost as (d1,,dN) is not, in general, a sufficient statistic for m. We return to this point in Section 2.2 but it should be clear that some information might be sacrificed for the sake of computational or data collection savings.

Although the advantages of using simultaneous sources have been previously explored, the question of selecting appropriate weights w has not received sufficient attention. We are aware of two main conducts that have been utilized so far. In [Citation10], the weights wj are independent zero-mean random variables of variance one and are used to define the Hutchinson randomized trace estimator.[Citation11] More precisely, let R be the residual matrix defined asR=A1m-d1ANm-dN.Then,EA(w)m-d(w)2=E(wRRw)=trace(RR)=jAjm-dj2=RF2,where ·F denotes the Frobenius norm. If w(1),,w(b) are independent random weight vectors, then by the law of large numbers1bk=1bA(w(k))m-d(w(k))2RF,as b. Therefore, the finite sum on the left-hand side can be used to approximate RF. It is easy to see that the uniform distribution on {-1,1} minimizes the variance of the trace estimator (see e.g. [Citation12]). Methods to select appropriate values of b are discussed in [Citation12].

In [Citation13] the Frobenius norm of the residual matrix is replaced by the squared 2-norm and w is then chosen so thatA(w)m-d(w)2R22.The paper is structured as follows. In Section 2, we derive the mathematical framework. In Section 3, we discuss efficient numerical techniques for the solution of the weight design problem. In Section 4, we demonstrate the effectiveness of our approach by performing numerical experiments and finally, in Section 5 we summarize the study.

2 Optimal selection of weights

In this section, we consider statistical properties of the estimate m^ to select optimal weights w. We also discuss methods to compensate for the information loss and to compare the loss achieved with different choices of weights w.

We begin our discussion with the well-posed problem and its solution (Equation7). As can be observed, m^ depends only on w/w; we may therefore assume that w=1. Since m^ is an unbiased estimator and Var(ϵ(w))=σ2I, it follows that its mean square error (MSE) isMSE(m^)=Em^(w)-m2=σ2trace((A(w)A(w))-1).To find an optimal w, in the sense of minimizing the MSE, we need to solve the constrained optimization problem9 minwtrace((A(w)A(w))-1)s.t.w=1.9 Before discussing properties of the solutions, we consider the ill-posed problem.

2.1 The ill-posed case

Other than recent numerous exceptions (see [Citation14Citation19] and references therein), optimal experimental design of ill-posed problems has been somewhat an under-researched topic. In the case of ill-posed problems, the selection of optimal weights for (Equation8) is more difficult because this estimate is biased and its bias depends on the unknown m: bias(m^)=-λ2C(w)-1Mm, where the inverse Fisher matrix is given byC(w)=A(w)A(w)+λ2M(again, we may assume w=1). The MSE of m^ is then10 MSE(m^)=Em^-m2=bias(m^)2+trace(Var(m^))=λ4C(w)-1Mm2+σ2trace(A(w)C(w)-2A(w)).10 As in the well-posed case, the goal would be to find weights w that minimize this MSE subject to the constraint w=1 but this is not possible because m is unknown. We need information on m to control the MSE.

If m is known to be in a convex set S, then we could consider a minmax approach where we minimize the worst MSE(m^) for mS. A less pessimistic approach is to minimize an average MSE(m^) over S. Of course, other choices are possible and may be useful as long as their interpretation can be rationalized. In this note, we minimize a weighted average of the MSE. We model m as random and since m^ only depends linearly on m, it is enough to use prior second-order moment conditions. For example, if E(m)=0 and Var(m)=Σm, then the Bayes risk RFB(w) under squared error loss of the frequentist estimate m^ defined by the fixed w isRFB(w)=E(MSE(m^))=λ4trace(C(w)-1MΣmMC(w)-1)+σ2trace(A(w)C(w)-2A(w)),and we solve the optimization problem11 minwRFB(w)s.t.w=1.11 Finally, assume the distributions are Gaussian. That is, the conditional distribution of d(w) given m is N(A(w)m,σ2I) and the prior distribution of m is N(0,τ2M-1), with τ2=σ2/λ2. In this case, (Equation8) is the posterior mean (and a maximum a posteriori or MAP estimator) and the posterior covariance matrix is Var(md(w))=σ2C(w)-1. For each fixed w the Bayes risk of m^ is thereforeRB(w)=σ2trace(C(w)-1).The corresponding optimization problem is12 minwRB(w)s.t.w=1.12

Remark

So far we have only considered linear/affine reconstructions of the model, which of course do not include many recovering techniques based, for example, on total variation or sparsity control. However, if the reason for choosing one of these nonlinear procedures is the belief that its optimal Bayes risk is smaller than that of the linear reconstruction, then the optimal Bayes risk of the latter provides an upper bound for the optimal Bayes risk of the nonlinear procedure. Furthermore, if the upper bound is not too conservative then minimizing the Bayesian risk for an affine estimator can be a useful guide for other reconstruction techniques.

2.2 Compensating for the information loss

As we have explained, some information may be lost with the reduction to d(w). If the loss is unacceptable, we may compensate by using more than a single reduction as follows. Let k be the number of independent data reductions and w()=(w1(),,wN()). Definedj()=Ajm+ϵj()(j=1,,N,=1,,k),where the sequence {ϵj()}j, is assumed to be i.i.d. N(0,σ2). The th data reduction is d(w())=jwj()dj(). We solve for m using the vector of k reductions d(Wk) and matrix A(Wk) defined asd(Wk)=d(w(1))d(w(k))andA(Wk)=A(w(1))A(w(k)).The associated Bayes risk is RB(Wk)=σ2trace(C(Wk)-1) with C(Wk)=A(Wk)A(Wk)+λ2M. The optimization problem is13 minWkRB(Wk)s.t.w(1)=1,,w(k)=1.13 There are some particular cases that may make this optimization easier and provide upper bounds for the optimal solution of (Equation13). These bounds can be useful when we study the loss as a function of k. One particular case consists of using prior information about the experiments to classify the matrices Aj into k groups whose members are similar on a realistic set of models m. For example d(w(1)) may include only A1 and A2 while d(w(2)) uses only A3, A4 and A5, etc. A second simplification consists of assuming w(1)==w(k) so that there is only one constraint but k independent experiments. The final and simplest approach is to find the optimal w in (Equation12) and then repeat the experiment k times. This leads to the risk14 RB(w,k)=σ2trace[(kA(w)A(w)+λ2M)-1].14

2.3 Quantifying the information loss

Although in general the optimization problem does not have a unique minimum, a local minimum may still provide a sufficient reduction of the risk achieved with the reduced data d(w). This does not mean, however, that the data reduction is useful for much information may be lost. It is therefore important to assess whether the computational savings obtained through the data reduction justifies the information loss.

Figure 1. The loss as a function of the number of vectors.

Figure 1. The loss as a function of the number of vectors.

To assess the information loss of the data reduction, we compare its associated risk to that of the complete data. The Bayes risk of the complete (i.e. exhaustive acquisition) problem isR0=trace(A^A^+λ2M)-1,while the risk associated with the reduced data d(w) is15 RB(w)=trace(A(w)A(w)+λ2M)-1.15 We quantify the information loss using the risk ratio16 Loss(w)=RB(w)R0.16 This loss indicates the degradation of the average recovery when using the reduction d(w) instead of the full data. If the loss is much greater than 1 then using a single set of weights may lead to significant deterioration in the recovery. In this case one may want to use a number of weights as explained in Section 2.2. The loss is thenLoss(Wk)=RB(Wk)R0.To determine an appropriate value of k, we may plot the loss as a function of k. The idea is to balance the trade-off between the cost of the experiment and the information loss. The following example illustrates this point.

Example 1

We generate N=50 random matrices Aj of sizes 100×200 and use k random unit vectors (not optimal) w(1),,w(k). Figure shows a plot of RB(Wk) as a function of k. The curve has a classical L-shape and provides an upper bound for the optimal risks RB(Wk). The figure shows that using less than 10 experiments will result in significant loss of information but using more than 30 may be a waste of resources. Thus, the loss curve is important if we would like to design an effective and yet efficient experiment.

2.4 Numerical properties of the design function

We now study properties of the weights selected in the fully Bayesian framework; that is, solutions of (Equation12). First, note that neither the constraint w=1 nor the objective functions RB(w) and RB(Wk) are convex and consequently there is no guarantee of a unique solution. In fact, if w is a solution of (Equation12), then so is w~=-w. And any permutation of an optimal sequence w(1),,w(k) is a solution of (Equation13).

Example 2

Consider the following simple 2×2 well-posed case:A1=1001,A2=100-1.It is easy to see that for w=1,trace((A(w)A(w))-1)=21-4w12w22=12(w12-1/2)2,and the optimal solutions of (Equation9) are w=±(1,0). That is, the optimal solution does not use the information provided by d2. To see how much information is lost, we compare the risk of the full least-squares solution and the one based on d(w). The two optimal weights lead to the same estimate m^(w)=d1 with MSE(w)=σ2trace(I)=2σ2. For the full least-squares solution, we have A^A^=2I and thusm^ls=12d1,1+d2,1d1,2-d2,2,with MSE(wls)=σ2. Hence, in this case the information lost by the reduction to d(w) leads to a MSE that is twice as large as that of the full least-squares estimate. To compensate, we may conduct a single replication with the optimal w to match the MSE obtained without the data reduction.

Example 3

Next, we consider the problem (Equation12) for a the case N=2 and N=3. The matrices Ai are random 7×10 matrices with i.i.d. N(0,1) entries, M=I and σ=1=λ. We parameterize w in terms of the angle 0ϕ2π by w=(cosϕsinϕ) for N=2 and by spherical coordinates for N=3. In Figure , the value of the objective function for two random realizations of the matrices A1 and A2 is plotted, and one realization for N=3. The problem has multiple minima for the first instance, but a unique minimum for the second instance, as the two minima are equivalent. For the case N=3 there are many more local minima at various places on the sphere. Local minima may not be too detrimental to our final goal. The value of MSE is not as important as long as we achieve a reduction of the risk. For the same reason, even if we do not have a global solution we may still find weights w that substantially decrease the risk one would have achieved by using other more naive choices.

Figure 2. The objective function as a function of ϕ for the weight w=(cosϕsinϕ) for two realizations of the matrices A1 and A2 from example 3. We also show the case N=3.

Figure 2. The objective function as a function of ϕ for the weight w=(cosϕsinϕ) for two realizations of the matrices A1 and A2 from example 3. We also show the case N=3.

3 Numerical solution

3.1 Direct approach

To solve the problem (Equation12), we enforce the constraints by parameterizing w in terms of N-1 spherical coordinate angles ϕi on the unit N-sphere:17 w1=cos(ϕ1)w2=sin(ϕ1)cos(ϕ2)w3=sin(ϕ1)sin(ϕ2)cos(ϕ3)wN-1=sin(ϕ1)sin(ϕN-2)cos(ϕN-1)wN=sin(ϕ1)sin(ϕN-2)sin(ϕN-1).17 We then use the BFGS method to solve the minimization problem iteratively. This requires the computation of the derivatives of the objective function. Rewriting the objective function asJ(w)=trace((A(w)A(w)+M)-1)=jej(A(w)A(w)+M)-1ejand settingzj=(A(w)A(w)+M)-1ej(A(w)A(w)+M)zj=ejwe have thatwJ(w)=jejwzj(w)To evaluate the derivative of zj we use implicit differentiation0=w((A(w)A(w)+M)zj)=w(A(w)A(w)zjfix)+(A(w)A(w)+M)wzjWe further define the matricesGF(v):=w(A(w)v)=A1vANvGT(u):=w(A(w)u)=A1uANuwhich leads to the calculationGj(w):=w(A(w)A(w)zj)=A(w)GF(zj)+GT(A(w)zj)and finally18 wJ=-jGj(w)(A(w)A(w)+M)-1ej.18 The gradient w.r.t. the angles ϕ is then obtained by multiplication by the Jacobian of the transformation (Equation17). We have used the BFGS [Citation20] method to solve each optimization problem and we fix the termination tolerance of each problem, demanding that the norm of the gradient is reduced to 10-2 of its initial value. To address the problem of multiple minima, we repeat the optimization procedure for a number of random initial guesses for w and select the solution with the smallest value of the objective function. The method to solve the problem (Equation13) which involves multiple weights is completely analogous.

The techniques introduced in the previous sections work well for small-scale problems. However for large-scale problems, when Aj is large the computation of the objective function and its derivatives can be rather expensive. Here, we can use a stochastic trace estimation [Citation11] and substitute the problem with19a minRFB(w;v)=E19a v[ 2 v&#Xtop;(C(w)-1M mMC(w)-1)v

+   v&#Xtop; (A(w)&#Xtop;C(w)-2A(w)) v ]

s.t. w1 and19b minRFB(w)=σ2Ev[v(A(w)A(w)+σ2M)v]s.t.w119b where the expectation value over random vectors v is approximated by sampling.

3.2 Heuristic weight selection

We can interpret minimization of (Equation15) as maximizing the operator A(w)A(w). However, in practice it only needs to be maximized over a set of ‘relevant’ models m. Let us now make the bold step of maximizing over a single reference model m0, which can be, for example, a constant background model or a more sophisticated prior. (Note that we cannot use the data to obtain m0 as we consider experimental design here.) We now obtainW=arg maxWm0A(W)A(W)m0.Now A(W)m0 is just the reduced data that would result from the model m0. Let us define and compute the m0 data matrix d~ as in (Equation2) but with di taken to be a row of s data (so d~ is N×s) and denote by W the k×N matrix of weights (with unit norm rows). We getW=arg maxWtraced~WTWd~Let us now perform a singular value decomposition (SVD) upon d~=UΣS. We obtainW=arg maxWWUΣ22and the solution for W (under appropriate constraints) is the sub matrix of U corresponding to the truncated SVD, which is equivalent to obtaining the first k principal components from the data matrix d~. For large-scale problems, the first k left eigen vectors and the corresponding eigen values can be estimated effectively through Lanczos iteration. In the following sections, we shall refer to this heuristic choice for W as the ‘SVD’ choice. This heuristic approach works almost as well as the more optimal W in the numerical examples below while avoiding the intricate and expensive optimization process associated with the optimal weights problem. A similar approach using the SVD for faster data processing rather than experimental design was proposed in [Citation21].

4 Numerical experiments

In this section, we test the methods developed on two-dimensional seismic tomography problems. First, we consider a simple toy model on a square domain to test and second we consider a more complicated reconstruction based on the Marmousi model.[Citation22]

We consider a rectangular domain Ω with the top boundary Ω1 representing a ground-air interface while the remaining boundary Ω2 represents an artificial boundary of the model.

The acoustic pressure field u is determined by the Helmholtz equation with appropriate boundary conditions:20a (·+ω2m)u=qinΩ20a 20b u=0onΩ120b 20c (n+iωm)u=0onΩ220c where m(x) is the slowness, i.e. m=1/c2 with c the propagation velocity and q(x) represents sources of angular frequency ω. The Sommerfeld boundary condition (Equation20c) prevents artificial reflections from the synthetic domain boundary Ω2. We discretize (20) on a rectangular grid using a five-point stencil for the Laplacian which is sufficient for numerical experiments and obtainAωu=(-Lω(m)+ω2diag(m))u=q,where the Laplace operator Lω depends upon m and ω through the Sommerfeld boundary condition. Note that u is a complex-valued vector.

Figure 3. Reconstructions with their corresponding relative error ϵr=m-mtrue2mtrue|2.

Figure 3. Reconstructions with their corresponding relative error ϵr=‖m-mtrue‖2‖mtrue|2.

Figure 4. Reconstructions with a single combined source using various weight choices (left). On the right, the fields generated by the sources are displayed.

Figure 4. Reconstructions with a single combined source using various weight choices (left). On the right, the fields generated by the sources are displayed.

The inverse problem is to determine m from measurements of u for several given sources qi, where i=1,N labels the experiments which can involve several frequencies. We assume further that all experiments measure u at the same locations which are described by the linear sampling operator S, i.e. we measure di=Sui. The predicted data for a given slowness model m is Fi(m)=SA-1qi (suppressing implicit ω dependence) and the least-squares error is21 ϕe(m)=12i=1N(Fi-di)(Fi-di).21 We add a Tikhonov regularization term to ϕe and solve the inverse problem as22 m=arg minmϕe+12λ2mΔ222 where ·Δ denotes the Laplacian weighted 2 norm.

We can apply the formalism developed above and consider instead of (Equation21)23 ϕ^e(m)=12k=1Ki=1Nwi(k)(Fi-di)j=1Nwj(k)(Fj-dj)23 for various choices of weights. The Bayes risk function (Equation13) is defined by linearization around a reference model m0. Because we are dealing with complex variables, the weight vectors are now complex valued and satisfy ww=1 and there is an additional complex phase factor eiρk for each component in (Equation17).

4.1 A simple toy model

For the first problem, we consider we take Ω to be the square domain [-11]2 which we discretize by a uniform 422 grid. Absorbing boundary conditions are assumed except at the top where we impose Dirichlet boundary conditions. Sources are placed on the left boundary and 20 uniformly spaced detectors are placed on the right. In Figure (a), we depict the synthetic velocity model which we use to compute simulated data to which we add 2% Gaussian noise. Sources of f=0.5 (i.e. one wavelength in domain) are placed on the left in two configurations. First we consider N=2 experiments with a source at 1/3 depth and at 2/3 depth. Second one source is placed in the middle. The inversion is performed as described in Section 3 and the regularization parameter λ was chosen by the Morozov discrepancy principle.[Citation23] We solved the optimization problem (Equation22) using limited memory BFGS. We depict the resulting reconstructions in Figure (b) and (c) along with a relative error measure defined as follows:24 ϵr=m-mtrue2mtrue224 It is clear that for N=2 we capture the basic structure of the model, whereas N=1 is insufficient.

Figure 5. Reconstructions of the Marmousi velocity model using only the low frequency f=0.2 data with all and with 2 simultaneous sources. We also indicate the relative reconstruction error ϵr=mrec-mtrue2mtrue2.

Figure 5. Reconstructions of the Marmousi velocity model using only the low frequency f=0.2 data with all and with 2 simultaneous sources. We also indicate the relative reconstruction error ϵr=‖mrec-mtrue‖2mtrue‖2.

Figure 6. Fields for the 2 simultaneous sources for the case f=0.2 using the various weight selection methods. Notice how the good weights cover the domain more or less uniformly.

Figure 6. Fields for the 2 simultaneous sources for the case f=0.2 using the various weight selection methods. Notice how the good weights cover the domain more or less uniformly.

Figure 7. Reconstructions of the Marmousi velocity model using the protocol described. We compare the final result using all the data, and a simultaneous sources protocol with optimal, SVD and random weights.

Figure 7. Reconstructions of the Marmousi velocity model using the protocol described. We compare the final result using all the data, and a simultaneous sources protocol with optimal, SVD and random weights.

Next we consider using a single simultaneous source by superposition of the two sources with a unit norm weight vector. In Figure , we show reconstructions using various methods for selection of the weights (left column). The right column shows the absolute value of the pressure field u generated by the corresponding simultaneous source. We computed the optimal weight vector as described in Section 3 using 32 different random initial guesses for w, and selecting the one with lowest risk. Second, we used the method described in Section 3.2 and finally we generated three random phase weight vectors. It is evident that the optimal weights yield the best reconstruction both visually and in terms of relative error as defined in (Equation24). These optimal results were followed closely by those of the SVD method 3.2. The random weights offered a reasonable reconstruction in (g) but not in (e) and (i). By inspecting the fields in the rightmost column seems that the optimal choice has selected the source combination to provide the best possible domain coverage. The source of the SVD-based approach did so to a lesser extent and the random choices essentially cover the domain randomly, as expected.

4.2 Marmousi model

As a second numerical experiment, we consider the true velocity model depicted in Figure (f) of the Marmousi model. The domain is 3 km by 10 km and velocity is given in km s-1 units. We use a 208×60 grid for the discretization. Forty emitters are placed at uniform distances on the surface and 39 receivers are placed in between the emitters. We generate data for five distinct frequencies f=0.2,1.5,3,4.5,6 (in Hz), with a total of 5×40=200 experiments. We note that the low frequencies are needed to avoid local minima, but are not quite realistic. We added 3% Gaussian noise to the synthetic data. Optimal weights were precomputed before the inversion by linearization around a constant m0=2 km s-1 background model which was also used for the SVD method for selection weights.

First of all we present the reconstructions for a single frequency of 0.2 Hz using all 40 experiments and with K=2 weight vectors, each a combination of the 40 sources. For the latter we show the results for the optimal weights, SVD-based weight selection, and random phase weights in two realizations. It is clear from Figure  that the random weight selection performs much worse that the other methods. In this case the SVD reconstruction is even better than using the optimal weights.

In Figure , we depict the resulting pressure field from the two simultaneous sources. We observe again that the optimal and SVD weights result in a more or less uniform and balanced coverage of the domain whereas the random choices do no share this virtue.

To obtain more accurate reconstructions, higher frequencies must be included. Because of the nonconvexity of the problem we perform a frequency continuation by which we first solve (to full convergence) for the lowest frequency, then use the result as initial guess for the next stage where we add an additional frequency, etc. Because accuracy of the reconstruction increases we can expect to require more simultaneous sources as well as more and more frequencies are included. In this example, we use K=2,6,12,20,30, simultaneous sources corresponding to frequencies f=0.2,1.5,3,4.5,6, again for the aforementioned three methods for weights selection. In Figure , we depict the resulted reconstructions. Remarkably, the reconstructions with the SVD or optimal weights are no worse visually than the reconstruction using all the data, whereas the random weights do not offer comparable results. We note that the advantage of using simultaneous sources decreases with higher frequencies.

5 Summary and conclusions

We have introduced a novel methodology for optimal experimental design of acquisitions involving simultaneous sources. Based on an optimal Bayesian risk statistical framework and the assumption of a linear inverse problem we have formulated an optimality criterion for selection of the weights in which individual observations should be combined. We derived a heuristic guess for the optimal weights based upon the SVD and we performed numerical experiments to test the efficacy of the method. In all cases, it was found that an informed choice of the weights generally outperforms a random weight choice. This is because our statistical framework incorporates information about the domain under consideration and the experimental setup.

Even though for most realistic applications the numerical models lead to nonlinear inverse problems, both weight selection based on a linearized version of the model was still found to perform quite well. Development of computationally viable optimal simultaneous source experimental design that accounts for the nonlinear nature of the observation model is a subject for future work.

In this study, Gaussian assumptions were taken for the noise model and Tikhonov regularization was considered for the ill-posed case. These choices are by no mean exclusive, and in principle, derivation of optimal experimental design for simultaneous sources can be performed for other alternative noise models as well as for other forms of regularization.

So far incorporation of constraints was not considered (other than such that can be introduced in the form of exact penalty). Future research would be to generalize the proposed formulation to account for such.

Additional information

Funding

This work was supported in part by IBM Research and MITACS Open Collaborative Research project for Design in Inversion http://ocrdesign.wix.com/home.

References

  • Arridge SR. Topical review: optical tomography in medical imaging. Inverse Probl. 1999;15:41–93.
  • Casanova R, Silva A, Borges AR. A quantitative algorithm for parameter estimation in magnetic induction tomography. Meas. Sci. Technol. 2004;15:1412–1419.
  • Dorn O. A shape reconstruction method for electromagnetic tomography using adjoint fields and level sets. Inverse Probl. 2000;16:1119–1156.
  • Hansen PC. Rank-deficient and discrete ill-posed problems. Philadelphia: SIAM; 1997.
  • Beasley CJ. A new look at marine simultaneous sources. Leading Edge. 2008;27:914–917.
  • Romero LA, Ghiglia DC, Ober CC, Morton SA. Phase encoding of shot records in prestack migration. Geophysics. 2000;65:426–436.
  • Herrmann FJ, Erlangga YA, Lin TTY. Compressive simultaneous full-waveform simulation. Geophysics. 2009;74:A35–A40.
  • Morton SA, Ober CC. Faster shot-record depth migrations using phase encoding. Vol. 17, SEG technical program expanded abstracts. New Orleans, LA: SEG; 1998. p. 1131–1134.
  • Romberg J, Neelamani R, Krohn C, Krebs J, Deffenbaugh M, Anderson J. Efficient seismic forward modeling and acquisition using simultaneous random sources and sparsity. Geophysics. 2010;75:WB15–WB27.
  • Haber E, Chung M, Herrmann F. Solving PDE constrained optimization with multiple right hand sides. SIAM J. Optim. 2012;22:739–757.
  • Hutchinson MF. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. J. Commun. Statist. Simul. 1990;19:433–450.
  • Tenorio L, Andersson F, de Hoop M, Ma P. Data analysis tools for uncertainty quantification of inverse problems inverse problems. Inverse Probl. 2011;27:045001.
  • Symes B. Source synthesis for waveform inversion. Denver: SEG Expanded Abstracts; 2010 p. 1018–1022.
  • Haber E, Horesh L, Tenorio L. Numerical methods for experimental design of large-scale linear ill-posed inverse problems. Inverse Probl. 2008;24:055012.
  • Coles DA, Morgan FD. A method of fast, sequential experimental design for linearized geophysical inverse problems. Geophys. J. Int. 2009;178:145–158.
  • Horesh L, Haber E, Tenorio L. Book series on computational methods for large-scale inverse problems and quantification of uncertainty. Volume optimal experimental design for the large-scale nonlinear ill-posed problem of impedance imaging. Wiley; 2009.
  • Maurer H, Curtis A, Boerner DE. Recent advances in optimized geophysical survey design. Geophysics. 2010;75:75A177–75A194.
  • Haber E, Horesh L, Tenorio L. Numerical methods for the design of large-scale nonlinear discrete ill-posed inverse problems. Inverse Probl. 2010;26:025002.
  • Lahmer T. Optimal experimental design for nonlinear ill-posed problems applied to gravity dams. Inverse Probl. 2011;27:125005.
  • Nocedal J, Wright S. Numerical optimization. New York (NY): Springer; 1999.
  • Habashy TM, Abubakar A, Belani A, Pan G, Schlumberger. Full-waveform seismic inversion using the source-receiver compression approach. In: 2010 SEG Annual Meeting; 2010 October 17–22; Vol. 3353; Denver, Colorado, USA: Society of Exploration Geophysicists; 2010.
  • Bourgeois A, Bourget M, Lailly P, Poulet M, Ricarte P, Versteeg R. The Marmousi experience. Vol. 5–9, Chapter Marmousi data and model. Copenhagen: EAGE; 1991.
  • Morozov VA. The discrepancy principle for solving operator equations by the regularization method. Zh. Vychisl. Mat. Mat. Fiz. 1968;8:295–309.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.