631
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Estimation method for inverse problems with linear forward operator and its application to magnetization estimation from magnetic force microscopy images using deep learning

ORCID Icon &
Pages 2131-2164 | Received 26 Mar 2020, Accepted 08 Mar 2021, Published online: 29 Mar 2021

ABSTRACT

This study considers an inverse problem, where the corresponding forward problem is given by a finite-dimensional linear operator T. The inverse problem has the following form: (data)=T(unknown). It is assumed that the number of patterns that the unknown quantity can take is finite. Then, even if Ker T{0}, the unknown quantity may be uniquely determined from the data. This case is the subject of this study. We propose a method for solving this inverse problem using numerical calculations. A famous inverse problem requires the estimation of the unknown magnetization distribution or magnetic charge distribution in an anisotropic permanent magnet sample from the magnetic force microscopy images. It is known that the solution of this problem is not unique in general. In this work, we consider the case where a magnetic sample comprises cubic cells, and the unknown magnetic moment is oriented either upward or downward in each cell. This discretized problem is an example of the above-mentioned inverse problem: (data)=T(unknown). Numerical calculations were carried out to solve this model problem employing our method and deep learning. The experimental results show that the magnetization can be estimated roughly up to a certain depth.

2010 Mathematics Subject Classifications:

1. Introduction

Let m,nN and T~:RmRn be a linear operator. For a finite set MRm, we consider the natural restriction of T~ defined on M as T=T~|M:MRn. In this study, we consider the inverse problems of the form: (1) estimate μM from observation data f=Tδ(μ):=T(μ)+δ,(1) where δRn denotes some noise and Ker T~={0} may not hold. Since M is a finite set, the map T:MRn may be injective even if Ker T~{0}. Then, T1 is often nonlinear.

The inverse problem of obtaining the spatial distributions of magnetic properties inside an object from experimental measurements has been investigated by numerous researchers (see [Citation1]). In this study, we consider such a magnetic inverse problem, that is, the magnetization-distribution identification problem represented by an integral equation (see (Equation2)). This problem requires the estimation of the unknown magnetization distribution or magnetic charge distribution in an anisotropic permanent magnet sample from its magnetic force microscopy (MFM) images. This integral equation is a Fredholm equation of the first kind, and its solution is generally not unique for the given data. In fact, in principle, it is impossible to determine the magnetic charge distribution inside a magnetic material from the MFM images (see [Citation2,Citation3]).

In this work, we assume that an anisotropic magnetic sample is composed of magnetic grains, where each grain has single magnetic domain structure and it is magnetically isolated from the neighbouring magnetic grains by non-magnetic boundary phase. Therefore, no magnetic domain walls exist in the sample. Further, we assume that the magnetic sample comprises cubic cells, and every magnetic grain is composed of one cell or a set of several cells. For example, an ideal hot-rolled magnet sample approximately has such a structure. We consider the following case. First, we apply a strong upward magnetic field to the magnetic sample to make the magnetized state, that is, to make the magnetic moments of all cells of this sample upward. Next, we apply a downward magnetic field to the sample. Then, the orientation of the magnetic moment of each cell either remains upwards or changes downwards. (When a magnetic grain is composed of several cells, the moments of those cells are oriented in the same direction.) To investigate the magnetic performance of the sample, our aim is to know at what positions inside the sample the magnetic moments are reversed from its MFM images as many as possible.

Since the magnet sample is assumed to consist of cubic cells, we discretize the integral equation and consider this discretized inverse problem as a model problem of (Equation1). Saito et al. [Citation4] obtained the Green kernel function of the integral equation. Our analysis is based on their formulation. We aim to estimate the magnetization distribution on the surface of the magnet and as deep in the interior as possible. In this study, we propose a method of reconstruction (a sequential estimation-removal method). We conducted computational simulations using this method. The experimental results show that the discretized magnetization can be estimated roughly up to a certain depth. We did not conduct experiments using actual magnets; however, we consider that it is appropriate to employ the high-performance MFM developed by Saito et al. (see [Citation5–7]) when conducting experiments using actual magnets. This MFM only detects the normal component of the magnetic field to a sample surface. In this study, we assume that such an MFM is used.

Various methods have been developed for reconstructing the internal magnetization distribution with some constraints or surface magnetization distribution from the MFM images, e.g. see [Citation8–11], and [Citation12]. However, the authors were unable to identify the research that solved our inverse problem.

The rest of this paper is organized as follows: In Section 2, we describe our magnetization-distribution identification problem and the cell discretization of this problem. This discretized problem is a model problem of the inverse problem (Equation1). In Section 3, we propose a method for solving the general inverse problem of the form (Equation1); we call it a sequential estimation-removal method (although it may be known by other names, the authors are unaware of them). This method is based on the idea of earlier estimating a part that can be estimated more accurately (Basic idea 3.2). We divide the target μM virtually into several parts, and each part is called a piece (in the case of our discretized magnetization-distribution identification problem, each piece corresponds to a set of several cells). In our proposed method, we estimate and virtually remove the pieces one-by-one from those that can be estimated more accurately. Consequently, the problem is localized, and the computational cost can be reduced. This method requires the linearity of T. In Section 4, we numerically consider the ill-posedness of the discretized magnetization-distribution identification problem and the original infinite-degrees-of-freedom problem, and the relationship between these problems using the results of our numerical experiments. As described in Sections 4.1 and 4.2, these problems are very ill-posed (cf. [Citation2,Citation3]). In Sections 5 and 6, we describe the results of the numerical experiments, in which we attempted to solve the discretized magnetization-distribution identification problem using our sequential estimation-removal method. This method is implemented by employing (artificial) neural networks as described in Section 5, and by utilizing an iterative algorithm as described in Section 6. These can be applied even if T1 is nonlinear. In particular, the set of neural networks obtained can act as an approximation of T1. The results of our numerical experiments show that the discretized magnetization μ can be roughly estimated up to a certain depth. Moreover, based on our experiments, the estimation of μ from our method is more accurate than estimating μ at one time. Since we aimed to estimate the magnetic moments as much as possible, the difficulty of estimating this ill-posed inverse problem has been highlighted in Sections 5 and 6.

In Sections 46, we consider the following simple and basic case: each magnetic grain comprises only one cell and the orientation of the moments of the cells are independent of each other. Thus we used randomly generated artificial training data and test data, and we did not apply any regularization or any Bayesian prior distribution. To obtain higher accuracy estimates in a real situation, training data that approximates the actual magnetic moment configurations should be used, and some regularization or some Bayesian prior distribution representing the actual magnetic moment configurations should be adopted. We discuss these aspects somewhat in Section 7. Further, in Section 7, we describe an inverse gravimetric problem as a problem for which our method may be applicable.

2. Magnetization-distribution identification problem and its discretization

2.1. Problem

We consider the discretized magnetization-distribution identification problem described in Section 1 as a model problem of the inverse problem (Equation1). This problem is a discretization of the problem of solving integral equation (Equation2). This integral-equation formulation is based on [Citation4]. The anisotropic permanent magnet sample is denoted by M. Figure  shows the three-dimensional MFM tip Green function G(r) (r=(x,y,z)R3) for a dipole type MFM tip, and the rectangle is M{z=0}. Here, we consider the Green function for a point magnetic charge. The magnetization for the dipole type MFM tip is regarded as an infinitesimal magnetic dipole moment, which gives the maximum spatial resolution. In Figure , the z-axis represents the normal direction to the surface of the sample M. Let Fz(r) denote the magnetic force gradient along the z direction at the position r, that is, Fz(r) is an R-valued function corresponding to the signal amplitude of an MFM image at r. Then, Fz(r) is given by the convolution integral of the point magnetic charge distribution ρ(r) with G(r) as (2) Fz(r)=(Gρ)(r):=MG(rr)ρ(r)dr,(2) where r=(x,y,z) runs through sample M, rM, and dr=dxdydz. Let m=(mx,my,mz) denote the magnetization of the MFM tip. In a real situation, both the magnetization components mx and my are 0. In the following, we normalize mz/4πμ0 to 1, where μ0 denotes the vacuum permeability. Then, G(r) is represented by (3) G(r)=15z39zx2+y2+z2x2+y2+z27/2.(3) Note that G(rr) does not become ∞ because the MFM tip is far from the magnetic sample. Our (infinite-degrees-of-freedom) inverse problem requires determining the unknown distribution ρ from the observation data Fz.

Figure 1. MFM.

Figure 1. MFM.

2.2. Simple discretization

Integral Equation (Equation2) is a linear Fredholm equation of the first kind. In general, the solution ρ is not unique for given data Fz (see [Citation2,Citation3]). In this study, as described in Section 1, we consider the following basic and ideal discretized problem of Equation (Equation2) as a model problem of (Equation1).

  1. All magnetic samples are cuboids of the same size (see Remark 2.1).

  2. Every magnetic sample is always on the same horizontal plane, and all the observation points are common to all samples.

  3. The MFM probe tip can be considered as a point magnetic dipole.

  4. Every magnetic sample is divided into small cubic cells as in Figure ; we call it cell discretization. Each cell or a set of several cells is a magnetic grain with single magnetic domain structure. (Although it is false if the boundary of a magnetic grain cuts the cell, we consider that it is limited to this case (see Experimental claim 4.3).) Every magnetic grain is magnetically isolated from the neighbouring magnetic grains by the non-magnetic boundary phase.

  5. The length of each side of every cubic cell is normalized to 1.

  6. The normalized value of the magnetic moment is 1 (upward) or 1 (downward) in each cell. (When a magnetic grain comprises several cells, the moments of those cells are oriented in the same direction.)

Remark 2.1

Further, if we discretize the space around the magnetic sample into cells, we do not need assumption (i). Then, assumption (ii) is ‘the normalized value of the magnetic moment is 1 (upward), 1 (downward), or 0 (space) in each cell’. In this case, the estimation is more difficult than in the case of assumption (i) because the estimation is to select one out of three possibilities. However, it is known which cells are included in the space, that is, which cells have the 0-value moment. The difficulty is mitigated if the information can be used for estimation.

Figure 2. Cell discretization.

Figure 2. Cell discretization.

We let nμ,nx,ny and nz denote the total number of cells, the number of cells in the x-axis direction, in the y-axis direction and in the z-axis direction, respectively. Therefore, nμ=nxnynz, and the cells can be numbered as c1,,cnz, cnz+1,,cnz+nz, c2nz+1,,c2nz+nz,,c(nxny1)nz+1,,c(nxny1)nz+nz.c2nz+1,,c2nz+nz,,c(nxny1)nz+1,,c(nxny1)nz+nz.c2nz+1,,c2nz+nz,,c(nxny1)nz+1,,c(nxny1)nz+nz. Here, suffix i=anz+b (a=0,,nxny1, b=1,,nz) means that the position of the cell is ath in the horizontal (xy plane) direction and bth in the vertical direction from the top. (The order in the horizontal direction is determined appropriately.)

Example 2.2

We consider the case of nμ=6, nx=2, ny=1, and nz=3. Then, the numbering of cells is as shown in Figure .

Figure 3. Numbering cells.

Figure 3. Numbering cells.

Let μi denote the magnetic moment of cell ci; then, μi=1 (upward) or μi=1 (downward). We consider the magnetic charge of each cell, which is determined from its moment. If μi=1, then the values of charges are 1 and 1 at the top and bottom faces of cell ci, respectively. If μi=1, then the values of charges are 1 and 1 at the top and bottom faces of cell ci, respectively. The values of charges are zero inside each cell and at all vertical faces of each cell. In addition to assumption (vi), we assume

(vii)

The values of the charge of the top/bottom face are centred at the centre point of the top/bottom face.

Next, we consider two vertically adjacent cells. The centre points of adjacent faces of these two cells are the same, and the value of the magnetic charge at this point is the sum of the values at the two adjacent centre points. Thus the value is 2, 0 or 2. Therefore, when the cell-discretized magnetic sample M comprises only two vertically adjacent cells, the possible values of charges are 2, 1, 0, 1 and 2. Figure  illustrates the magnetic moments (up or down arrows) and charges in the four possible cases. Next, we consider the general case. By identifying overlapping points, the total number of the centre points of the top and bottom faces of all cells is given by nρ:=nxny(nz+1). These points can be numbered as p1,,pnz+1, p(nz+1)+1,,p(nz+1)+(nz+1), p2(nz+1)+1,,p2(nz+1)+(nz+1), …, p(nxny1)(nz+1)+1,,p(nxny1)(nz+1)+(nz+1). Here, suffix j=a(nz+1)+b of pj (a=0,,nxny1, b=1,,nz+1) means that the position of the charge is ath in the horizontal (xy plane) direction and bth in the vertical direction from the top. The order in the horizontal direction is the same as that of the cells. Let ρj denote the magnetic charge at pj. Then, ρj=2,1,0,1 or 2.

Figure 4. Magnetic moments and charges.

Figure 4. Magnetic moments and charges.

Let no denote the number of the observation points, where we observe the values of Fz at these points. We number these points appropriately as o1,o2,,ono, and we let fk denote the value of Fz(r) at ok. Next, we define (4) gkj:=G(okpj).(4) This is a discretization of (Equation3). Put f:=f1f2fno,G:=g11g1nρg21g2nρgno1gnonρ,ρ:=ρ1ρ2ρnρ.Thus, we have a discretized equation of (Equation2), (5) f=Gρ.(5)

The relationship between μi and ρj is as follows. Put R:=R1000R2000Rnxny,μ:=μ1μ2μnμ,where μi denotes the moment of ci (i=1,,nμ) and R1,,Rnxny are (nz+1)×nz matrices such that R1==Rnxny=100000110000011000000110000011000001.Then, we have ρ=Rμ. Thus our discretized (noiseless) inverse problem is now to solve the following system of linear equations: (6) GRμ=f.(6) This equation is the same as Equation (Equation1) with m=nμ, n=no, T=GR, and δ=0. Since each μi is either 1 or 1, the total number of possible patterns that μ can take is 2nμ. Let M be the set of the all patterns, that is, M=μ:  each μi is 1 or 1={1,1}nμ.Then, GR is a map from MRnμ to Rno. We call μM a moment vector. Since M is a finite set, GR may have the inverse map even if dimno<dimnμ.

We consider the discretized inverse problem (Equation6) under assumption (i)–(vii); however, the original inverse problem is (Equation2). In the rest of this subsection, we consider the relationship between these two problems. Let OR3 denote an observation area, that is, O is an area containing all the observation points. In Figure , O is a two-dimensional square. Let O and M denote appropriate norms of a function defined on O and M, respectively, e.g. they are the L2 norms. Then, for Equation (Equation2), there exists C > 0 such that FzO=GρOCρMfor every ρ because G(rr) (rO,rM) is bounded on O×M. Only in the rest of this subsection, we denote the length of each side of every cell by Δ and write ρΔ as ρ, GΔ as G, and fΔ as f in the discretized Equation (Equation5). We can consider that the vectors fΔ and ρΔ are piecewise constant functions defined on O and M, respectively. (Note that the components of these vectors are labelled with no points in O and nρ points in M, respectively.) Then, we have FzGρΔO=GρGρΔOCρρΔMand GρΔfΔO=GρΔGΔρΔOGGΔρΔM,where GGΔ is the operator norm of GGΔ. Note that GGΔ0 as Δ0 because the integral kernel of G is smooth, and there exists C>0 such that ρΔM<C for every Δ>0 because |ρj|2. Therefore, by using FzfΔOFzGρΔO+GρΔfΔO, we have a stability property: fΔFz as Δ0 and ρΔρ. This stability property ensures the validity of the discretized approximation of the ‘forward’ problem, that is, the problem of determining Fz from ρ. However, our inverse problem does not have such a stability property. We consider this in Section 4.2.

2.3. Selection of pieces of M

In Section 1, we introduced the concept of a piece of M. Each piece of M is a subset of M, and the sum of all pieces is M. It is important to select pieces appropriately for problem (Equation1). The properties that pieces are expected to satisfy are described in Sections 3.2 and 3.3. For the inverse problem (Equation6), the following pieces (i.e. subsets) of M can be naturally obtained.

For a moment vector μ=[μ1,,μnμ]tM ([  ]t denotes the transpose) defined in Section 2.2 and =1,2,,nz, define μ[]=μ[],1, μ[],2, , μ[],nμtRnμby μ[],i:=μi,if i=,nz+,2nz+,,(nxny1)nz+0,otherwise.Consequently, the ith component of μ[] is equal to the ith component of μ if the component is the magnetic moment of one of the ℓth cells vertically from the top of M, and otherwise, the ith component of μ[] is 0. For example, the non-zero components of μ[1] correspond to the cells of the top surface of M. (Furthermore, see Example 2.3.) We define M[]:=μ[]: μMRnμ(=1,2,,nz)and we call M[] the ℓth piece of M. In particular, M[1] and M[m¯] are called the top piece and the bottom piece, respectively. Furthermore, M[] with a small value of ℓ is called an upper piece while M[] with a large value of ℓ is called a lower piece. Then, we have (7) μ==1nzμ[].(7) There exists a natural projection M=μ==1nzμ[]: μ[]M[]=1nzM[]=1nzRnμ,where the symbol ⊕ denotes the direct sum of vector spaces or their subsets. However, M[]M does not hold because μ does not have zero components and μ[] has zero components.

Example 2.3

We consider the case of Example 2.2. The cells {c1,c2,,c6} corresponding to the elements [μ1,μ2,,μ6]t are as shown in Figure . Then, for μ=[μ1,μ2,,μ6]t, we have μ[1]=[μ1,0,0,μ4,0,0]t, μ[2]=[0,μ2,0,0,μ5,0]t, and μ[3]=[0,0,μ3,0,0,μ6]t. In this case, the sets of cells corresponding to nonzero components of M[1], M[2], and M[3] are {c1,c4}, {c2,c5}, and {c3,c6}, respectively.

In the following, we often identify a piece with the set of cells corresponding to it.

3. Inverse problem with a linear forward operator and a sequential estimation-removal method

In this section, we consider the general inverse problems of the form (Equation1).

3.1. Inverse problem with a linear forward operator

Let m,n, T, T~ and MRm be the integers, operators and set as described in Section 1. In the following, we aim to develop an estimation method for the inverse problems of the form (Equation1) with Ker T~{0} and δ0. Since M is a finite set, T:MRn may be injective even if Ker T~{0}. Then, the inverse map T1:T(M)M exists. However, it is often nonlinear as per the following lemma.

Lemma 3.1

Let L and L be the linear subspaces spanned by M and T(M), respectively. Assume that LKer T~{0} and T:MT(M) is injective. Then, T1:T(M)M cannot be extended to a linear map LL.

Proof.

Since T:LL is linear, we have dimLdimL. If T1:T(M)M has a linear extension, we have dimLdimL. Therefore, dimL=dimL. Then, T~:LL is also injective. Hence, LKer T~={0}. This contradicts the assumption that LKer T~{0}.

Even if T is not injective, there may exist subsets MI,MC of Rm such that every μM has a unique decomposition of the form: μ=μI+μC,μIMI, μCMCand μI can be uniquely determined by T(μ). For simplicity, we denote the map (8) T(M)MI,T(μ)=T(μI)+T(μC)μI(8) by T1. In general, the map T(μI)μI is also nonlinear. The set of all the vectors of the form of (Equation15) below is an example of MI (if δ=0).

From Section 4 onward, we consider the (discretized) magnetization-distribution identification problem described in Section 2. It is an example of the above-mentioned inverse problem. In this problem, M={1,1}m and (practically) Ker T~{0}. However, in some cases, the inverse map T1 exists and it is (practically) nonlinear. (The linear space L spanned by M is Rm. Therefore, LKer T~{0}, and the nonlinearity of T1:T(M)M is obtained using Lemma 3.1.) Therefore, it is difficult to solve the inverse problem by employing a linear algebra method. We aim to construct a map N that approximates T1, NT1, for example, by machine learning (deep learning). Further, this map N is expected to be as robust as possible to noise.

3.2. A sequential estimation-removal method

Machine learning is often computationally expensive. In the numerical experiments conducted on our magnetic inverse problem, the deep learning for constructing a neural network, NT1, stopped before the value of the loss function becomes small enough. (One of the reasons we think is described in Section 3.4.) Therefore, we divide the problem as follows. Let m¯N, m¯m and M[1],M[2],,M[m¯] be subsets of Rm such that, every μM has a unique decomposition of the form: (9) μ==1m¯μ[]μ[]M[],(9) where denotes the sum of vectors in Rm (see (Equation7)). As described in Section 2.3, there exists a natural projection M==1nzμ[]: μ[]M[]=1m¯M[].However, M[]M may not hold, and we call M[] the ℓth piece of M.

If T is injective, there exists the inverse map N, i.e. N(T(μ))=μ. Therefore, we can consider maps N[1],N[2],,N[m¯] such that N[](T(μ))=μ[].Moreover, for f=Tδ(μ)=T(μ)+δ with Ker T~{0}, it may be expected that the following holds: N[](f)=N[]Tδ(μ)μ[].If we have these maps, we can estimate μ as μ=1m¯N[](f).This method independently estimates μ[1],μ[2],,μ[m¯] and does not require the linearity of T. In the numerical experiments on our ill-posed magnetic inverse problem, this method was superior to the above-mentioned method that estimates μ at once. However, its performance was considerably inadequate even if δ=0. Therefore, we propose a reconstruction method (sequential estimation-removal method) based on the following idea:

Basic idea 3.2: A piece that can be accurately estimated should be estimated earlier.

In this method, we employ the linearity of T to solve the inverse problem (Equation1).

Let O denote a norm on Rn. To realize Basic idea 3.2, we expect that the pieces M[1],M[2],,M[m¯] are selected such that the value of T(μ[])O decreases as the value of ℓ increases. First, we expect that the selection satisfies (10) Tμ[+1]O<Tμ[]O(10) for all μM and =1,,m¯1. We call this property piece monotonicity. If this property cannot be obtained, we expect the following property holds:

  • For every μM and every =1,,m¯, it holds that dimμ[]=m/m¯. (In our magnetic inverse problem, this implies that every piece corresponds to the same number of cells.)

  •  (Equation10) holds if μ[]=μ[+1] as m/m¯-dimensional vectors, where the order of the elements is properly determined (see Example 3.3).

We call this property weak piece monotonicity. As described in Section 2.3, we call M[] with a small value of ℓ an upper piece while M[] with a large value of ℓ a lower piece. In particular, M[1] and M[m¯] are called the top piece and the bottom piece, respectively. To realize Basic idea 3.2, we expect that a piece closer to the top piece can be estimated more accurately. In Section 4.3, it is experimentally shown that the pieces obtained in Section 2.3 for our magnetic inverse problem satisfy piece monotonicity for =1,,m with some m m¯=10 (see Table ) and weak piece monotonicity for =1,,m¯=10 (see Table ). In Assumption 3.5, we will describe other properties that pieces are expected to satisfy, and in Section 3.3, we will describe some conditions for Assumption 3.5 to hold.

Example 3.3

We consider the case of the discretized magnetization-distribution identification problem described in Section 2. For example, in the same case of m=nμ=6 and m¯=nz=3 as in Examples 2.2 and 2.3, we consider μ[1]=[1,0,0,1,0,0]t, μ[2]=[0,1,0,0,1,0]t, and μ[3]=[0,0,1,0,0,1]t to be equal to each other as 2(=m/m¯)-dimensional vectors.

We consider a reconstruction method based on Basic idea 3.2. First, we consider the case where Ker T~={0} and δ=0, that is, the well-posed case. For μM and μ[]M[] (=1,2,,m¯) given by (Equation9), we define μ[]:=k=m¯μ[k],f[]:=Tμ[],f[]:=k=m¯f[k].Then, μ[1]=μ. Using the linearity of T, we have, for =1,,m¯, Tμ[]=k=m¯Tμ[k]=k=m¯f[k]=f[],and then, f[1]=f and T1f[]=μ[].Therefore, for =1,2,,m¯, there exists T[]1 such that it outputs μ[] to the input f[], i.e. (11) T[]1f[]=μ[]=μ[]M[]=1,2,,m¯.(11) Note that T[]1 does not give μ[] but μ[] from f[]. Thus we can use T[]1 (=1,2,,m¯) to obtain μ from a given f by the following procedure.

Procedure 3.4

  1. Put =1.

  2. Calculate μ[1]=T[1]1(f)and f[1]=Tμ[1].

  3. +1.

  4. Calculate (12) μ[]=T[]1f[]=T[]1fk=11f[k],(12) and if <m¯, calculate f[]=Tμ[].

  5. If =m¯, stop the procedure. Otherwise, go back to step 3.

This procedure outputs μ==1m¯μ[].In this procedure, the pieces are estimated and removed one-by-one from the top. We call this method a sequential estimation-removal method.

In general, Ker T~ may not be equal to {0}, and there exists some noise δ0 in (Equation1). Therefore, we may not execute Procedure 3.4. In this case, we aim to obtain as much information regarding μ as possible. In the following, we assume that, for every f=Tδ(μ)=T(μ)+δ, there exists δ[]Rn such that f[]=T(μ[])+δ[] for =1,2,,m¯. Set M[]:=k=m¯m[k]:m[k]M[k]k=m¯M[k],where the symbol  means a natural projection,F[]:=T(ν): νM[],Δ[]:=ηRn: possible noise for the elements of {F[]},D[]:=T(ν)+η: νM[],ηΔ[].Then, F[] and D[] are the sets of components of the ℓth and lower pieces of all possible observation data values for the case where δ=0 and δ0, respectively. We have M[1]=M, and set Δ:=Δ[1]. Here, considering that M is a finite set, we assume the following.

Assumption 3.5

  1. (For the case where δ=0)

    There exist m¯¯N, m¯¯m¯, and T[]1:F[]M[] (=1,2,,m¯¯) such that (13) T[]1f[]=μ[](13) holds for every f=T(μ) (μM).

  2. (For the case where δ0)

    There exist m¯¯N, m¯¯m¯, and T[]1:D[]M[] (=1,2,,m¯¯) such that (14) T[]1f[]=μ[](14) holds for every f=Tδ(μ) (μM,δΔ).

In this assumption, Ker T~ may or may not be {0}. If Ker T~={0}, Assumption 3.5 (i) always holds for m¯¯=m¯ (see (Equation11)). T[]1 in (Equation14) is an extension of T[]1 in (Equation13) to the case where δ0. Based on Assumption 3.5, Procedure 3.4 can be rewritten as follows.

Procedure 3.6

Suppose that Assumption 3.5 is satisfied.

  • Steps 1, 2 and 3 are the same steps in Procedure 3.4.

  • Steps 4 and 5 are steps 4 and 5 in Procedure 3.4, where m¯ is replaced by m¯¯.

This procedure does not output μ. Instead, it outputs (15) =1m¯¯μ[],(15) that is, μ[m¯¯+1],,μ[m¯] cannot be estimated. Even when we define maps that estimate them in some sense, the estimated results may contain errors.

In the experiments in Sections 5 and 6, we use Procedure 3.4 even if m¯¯<m¯ to test how deep (large ℓ) a piece our method can estimate. Deep learning in Section 5 and the iterative method in Section 6 can be formally applied even if m¯¯<m¯.

3.3. Selection of pieces of M

We consider piece monotonicity and weak piece monotonicity to be a naturally expected property of pieces to realize Basic idea 3.2. To execute Procedure 3.6, we must obtain pieces M[1], …, M[m¯¯] earlier and construct maps T[1]1, …, T[m¯¯]1 that satisfy Assumption 3.5. This assumption requires that the pieces satisfy some properties, which are somewhat different from (weak) piece monotonicity. This is considered here. Previous to it, we describe the stability/instability of the inverse problem (Equation1). We distinguish the elements of M with a superscript index as μ(α). Let M denote a norm on Rm, and (16) r(αβ):=Tμ(α)Tμ(β)Oμ(α)μ(β)M(αβ),rmin:=minr(αβ): μ(α),μ(β)M, αβ.(16) If rmin>0, then the inverse problem (Equation1) with δ=0 has a unique solution, and (17) μ(α)μ(β)M1rminTμ(α)Tμ(β)O(17) holds for every μ(α) and μ(β). As the value of rmin increases, the stability of the inverse problem (Equation1) increases. In fact, if μ(α)μ(β) holds and rmin is sufficiently large, then the values of T(μ(α)) and T(μ(β)) are very different from each other.

For selecting pieces of M, we consider a sufficient condition for Assumption 3.5 to hold (Lemma 3.8) and make the same consideration as above for each piece. Previously, we consider the following lemma that provides a sufficient condition for Assumption 3.5 (ii) to hold.

Lemma 3.7

Let {1,2,,m¯}. Assume that (18) 2supηΔ[]ηO<Tμ[](α)Tμ[](β)O(18) is satisfied for ℓ and every μ(α),μ(β)M, αβ. Then, Assumption 3.5 (ii) holds for ℓ.

Proof.

Let g=T(μ[])+δ[] (μ[]M[], δ[]Δ[]). From (Equation18), we have that, for every μM with μ[]μ[], Tμ[]gO=Tμ[]Tμ[]δ[]OTμ[]Tμ[]Oδ[]O>2supηΔ[]ηOδ[]O>δ[]O.Further, Tμ[]gO=δ[]O.Thus we have μ[]=argminνM[]T(ν)gO.Therefore, we can define T[]1:D[]M[] by T[]1(g):=argminνM[]T(ν)gOM[]

The value of the left-hand side of (Equation18) is determined by noise. We separate the noise-independent part from (Equation18), that is, we consider an inequality for T that is similar to (Equation17) for each piece in the following lemma. This is inequality (Equation19). This inequality represents the difficulty of the estimation independent of noise.

Lemma 3.8

Let {1,2,,m¯}. For ℓ, consider the conditions that there exists C[]>0 such that

  1. it holds that (19) μ[](α)μ[](β)MC[]Tμ[](α)Tμ[](β)O(19) for every μ(α),μ(β)M,

  2. it holds that (20) 2C[]supηΔ[]ηO<μ[](α)μ[](β)M(20) for every μ(α),μ(β)M, αβ.

Put L:={1,2,,m¯¯}. If and only if every L satisfies Lemma 3.8 (i), then Assumption 3.5 (i) holds. If every L satisfies Lemma 3.8 (i) and (ii), then Assumption 3.5 (ii) holds.

Proof.

The first case can be shown via a straightforward calculation. We show the second case. By employing (Equation19) and (Equation20), we have 2C[]supηΔ[]ηO<μ[](α)μ[](β)MC[]Tμ[](α)Tμ[](β)O.Therefore, (Equation18) holds.

In Lemma 3.8, as the value of C[] decreases, it becomes easier to distinguish between μ[](α) and μ[](β). To apply Lemma 3.8 to the inverse problem (Equation1), it is desirable for us to know the value of minC[]. Define (21) r[](αβ):=Tμ[](α)Tμ[](β)Oμ[](α)μ[](β)M(αβ)(21) and (22) r[]:=minr[](αβ): μ[](α),μ[](β)M[], αβ.(22) Then, if r[]0, minC[]=1/r[] holds and inequality (Equation20) can be rewritten as (23) 2r[]supηΔ[]ηO<μ[](α)μ[](β)M.(23) As the value of T(μ[](α))T(μ[](β))O increases, it becomes easier to distinguish between μ[](α) and μ[](β). In general, if the values of T(μ[])O (μ[]M[]) are large, it may hold that the values of T(μ[](α))T(μ[](β))O are also large. Therefore, we expect piece monotonicity (otherwise weak piece monotonicity) holds.

Thus, the preparation procedure prior to Procedures 3.4 and 3.6 is given below. In the case where δ0, it is particularly useful if the range of values of μ[](α)μ[](β)M (αβ) is approximately the same for =1,2,,m¯¯. (Our discretized magnetization-distribution identification problem satisfies this property of this range.)

Procedure 3.9

1.

1. Select M[1],M[2], ,M[m¯] satisfying (24) r[1]r[2]r[m¯]if possible, r[1]>r[2]>>r[m¯](24) and Assumption 3.5 (i) (when δ=0) or (ii) (when δ0) for =1,2,,m¯¯. In addition, it is desirable that piece monotonicity (Equation10), otherwise weak piece monotonicity, be satisfied for as many =1,,m¯1 as possible.

2.

2. Construct a map N[]T[]1 that approximately satisfies Assumption 3.5 from the top piece (=1) to the lowest possible piece. For example, N[] is a neural network whose activation function is the rectified linear function, and the loss function for training is given by the mean squared error: (25) N[]Tμ[]μ[]M2.(25)

In step 2 of Procedure 3.9, such a neural network N[] is a piecewise linear function and N[](g) is well defined for every gRn. It is expected that N[] maps every g close to T(μ[]) to μ[], that is, Assumption 3.5 holds approximately.

Since Assumption 3.5 provides the existence of T[]1 only for =1,,m¯¯, Procedure 3.6 may not accurately estimate μ[m¯¯+1],,μ[m¯] in general. If we attempt to estimate them, errors may occur. Moreover, the errors of the estimates made at the upper piece are transmitted to the lower piece.

3.4. Complexity/number of linear regions

Assume that pieces M[1],M[2],,M[m¯] satisfy piece monotonicity or weak piece monotonicity. Then, μ[+1] may be more difficult to estimate than μ[], that is, for μ,μM, the distinction between μ[+1] and μ[+1] may be more difficult to identify than the distinction between μ[] and μ[]. Now, we sketch one of the reasons.

As an example, we consider the following case: μ(a),μ(b)M[1], μ(c),μ(d)M[2], Tμ(c)OTμ(a)O,Tμ(c)OTμ(b)O,Tμ(d)OTμ(a)O,Tμ(d)OTμ(b)O,and Tμ(c)Tμ(d)OTμ(a)Tμ(b)O.

Figure  illustrates this case. Set μ(1):=μ(a)+μ(c), μ(2):=μ(a)+μ(d), μ(3):=μ(b)+μ(c) and μ(4):=μ(b)+μ(d). We consider identifying μ(a),,μ(d) from (noiseless) observation data T(μ(1)),,T(μ(4)). If N[] is a neural network whose activation function is the rectified linear function, the identification performance of N[] depends on the number and shape of linear regions in the input space (see [Citation13]). Therefore, we consider using straight lines to identify μ(1),,μ(4), or μ(a),,μ(d).

  1. In the input space of N(1), we can distinguish between {T(μ(1)),T(μ(2))} and {T(μ(3)),T(μ(4))} by one straight line (e.g. L1). Therefore, we can distinguish between μ(a) and μ(b) by one straight line.

  2. Next, we can distinguish between μ(c) and μ(d) by one straight line using our sequential estimation-removal method because we can obtain T(μ(c)) and T(μ(d)) by T(μ(c))=T(μ(1))T(μ(a))=T(μ(3))T(μ(b)) and T(μ(d))=T(μ(2))T(μ(a))=T(μ(4))T(μ(b)), respectively.

Figure 5. Complexity

Figure 5. Complexity

Thus, using our sequential estimation-removal method, we can identify μ(a),,μ(d) (or μ(1),,μ(4)) by one straight line twice. However, three straight lines are required to distinguish between {T(μ(1)),T(μ(3))} and {T(μ(2)), T(μ(4))} (e.g. L1, L2, and L3). Therefore, only one straight line is required to distinguish between μ(a) and μ(b), but three straight lines are required to distinguish between μ(c) and μ(d) directly from the observation values. These mean that T(μ(c)) and T(μ(d)) are ‘buried’ in T(μ(a)) and T(μ(b)).

4. Some numerical analysis of our discretized magnetic inverse problem

4.1. ‘Practical’ rank of matrix G

We consider the rank of matrix G. As described below, a ‘practical’ rank of G is significantly low. Since the fractional value of (Equation4) for given (ok,pj) is usually not divisible and all components of G will be different from one another, the ‘exact’ rank of G probably will be full. However, in practice, two almost parallel row (or column) vectors should be considered to be exactly parallel. Thus we examine how parallel the row vectors of G are by applying the following procedure. (The modified Gram-Schmidt algorithm was used in implementing this procedure.)

Procedure 4.1

Let gk (k=1,2,,no) denote the k-th normalized row vector of G. (If g¯k is the kth row vector of G, then gk=g¯k/g¯k, where denotes the Euclidean norm in Rnρ.)

1.

1. Let ϵ>0 be a sufficiently small real number. Set n=1. Let L1 be the linear space spanned by g1 and u1:=g1.

2.

2. Using the basis {u1,,un} of Ln, calculate ur:=grk=1n(gr}vk)vk,for r=n+1,,no, where $$ denotes the inner product.

3.

3. Calculate h:=argmaxr=n+1,,nour.Then, swap un+1 and uh, and swap gn+1 and gh.

4.

4. If un+1>ϵ, let Ln+1 be the linear space spanned by {u1,,un,un+1}.

If un+1ϵ, stop the procedure and output {u1,,un}.

5.

If n+1=no, stop the procedure and output {u1,,uno}.

If n+1<no, set nn+1 and go back to step 2.

Remark 4.2

The distribution of {un} given by Procedure 4.1 depends on the order of gk specified by k. Step 2 determines this order except the selection of g1.

In our numerical experiments, we set ϵ=0.0001 and recorded the values of {un} ({un} is the output). These values represent a ‘practical’ rank of G. In fact, if the value of un is below a certain threshold (for example 0.01), we consider that gn belongs to the linear space spanned by g1,,gn1 in our numerical calculations. Some results of experiments using this procedure are presented in Tables .

Table 1. ‘Practical’ rank of G (observation points: 20×20×1).

Table 2. ‘Practical’ rank of G (Observation points: 20×20×4).

Table 3. ‘Practical’ rank of G (observation points: 40×40×1).

Table 4. ‘Practical’ rank of G (Observation points: 40×40×2).

In these tables, ‘charge: 20×20×h’ means that nρ (nρ=20×20×h) magnetic charge points p1,p2,,pnρ are arranged as follows: 20 points in the x direction at intervals of 1, 20 points in the y direction at intervals of 1 and h points in the z direction at intervals of 1. Furthermore, ‘observation points: w1×w2×h’ has almost the same meaning. However, when w1=w2=40, the interval in the x direction and that in the y direction are 0.5. In Tables , the distances between the top plane of charge points and the bottom plane of observation points are 1, 5 and 10 when the number of charges is 20×20×10, 20×20×6 and 20×20×1, respectively. For example, the first row of Table  means that the numbers of vectors un with un0.9 are 400, 21 and 9 in the cases of 20×20×10, 20×20×6 and 20×20×1, respectively.

These experimental results illustrate that a ‘practical’ rank of G is significantly low. In the case of Tables , the numbers of observation points (the numbers of row vectors) are 400 (=nxny), 1600, 1600 and 3200, respectively. When the height h of M is 6 or less, that is, when the number of charges of M is 20×20×6 or less, the ‘practical’ rank of G is almost the same irrespective of whether the number of observation points is 400, 1600 or 3200 (cf. [Citation3]). Moreover, based on the results of our experiments described in Section 5.3, even if the number of observation points was nxny, the estimations were almost accurate up to the 5th piece. Therefore, it is hypothesized that having over 400 observation points is not considerably advantageous when using our sequential estimation-removal method. Our inverse problem is analogous to looking into the sea and identifying objects in it. The increase in the number of observation points does not considerably increase the amount of information. It is more difficult to identify objects that are at deeper positions (see Section 4.2).

4.2. Stability/instability

We discuss the stability/instability of our discretized inverse problem (Equation6). Some results of our numerical experiments on the distribution of r(αβ) defined by (Equation16) for T = GR are shown in Figure . In each histogram of Figure , the horizontal and vertical axes represent the ratio r(αβ) and its frequency, respectively. As the value of r(αβ) decreases, the difficulty of distinction between μ(α) and μ(α) increases. If and only if r(αβ)>0, we can distinguish between μ(α) and μ(α).

Figure 6. Distributions of r(αβ)

Figure 6. Distributions of r(αβ)

As shown in Figure , all the observation points are placed on the same horizontal plane in these experiments, and the number of observation points is equal to the number of cells in the horizontal direction, that is, no=nxny. The values of the parameters utilized are nx=ny=20, nz=5 and no=nxny=400. Let d(O,M) denote the distance between the horizontal plane O, where the observation points are placed, and the top surface of the magnetic sample M. In Figure , the values of d(O,M) of Figures (a), (b), …, and (f) are 0.1, 1, 10, 20, 30 and 100, respectively. Note that the length of each side of every cubic cell is normalized to 1. Since the number of elements of M, 2nμ=22000, is very large, we did not conduct experiments on all pairs {(α,β)}, but on randomly selected pairs. To provide details, we randomly selected 1000 magnetic moment vectors, and we calculated r(αβ) for all (499,500 possible) pairs of these moment vectors, where the randomness means the following: the probability that the moment of each cell is 1 and the probability that it is 1 are both 0.5.

Let decompose μ(α) and μ(β) such that μ(α)=μU(α)+μL(α),μ(β)=μU(β)+μL(β),where (  )U is an upper part and (  )L is a lower part. Assume μU(α)=μU(β), then (26) r(αβ)=Tμ(α)Tμ(β)Oμ(α)μ(β)M=TμL(α)TμL(β)OμL(α)μL(β)M.(26) In such a case, we can also consider that d(O,M) in Figure is the sum of the depth of the upper pieces and the distance between the horizontal plane O and the top surface of the magnetic sample M. Therefore, for example, we can consider that when we can estimate nine pieces from the top correctly, the histogram of Figure (c) represents the difficulty of estimating the pieces below them.

Thus the experimental results numerically illustrate that the magnetic moment distribution to a certain depth can be estimated from the MFM images, and the estimation becomes difficult as the depth increases. This property is independent of the choice of reconstruction method. One of the reasons for this difficulty is explained below. We consider two cells cj and cj+1 that are horizontally adjacent, and the ratio of distances from pj and pj+1 (the centre points of the top faces of cj and cj+1) to observation point ok, i.e. pj+1ok/pjok. As points pj and pj+1 become farther away from point ok, this ratio becomes closer to 1; consequently, the ratio gk,j+1/gkj also becomes closer to 1, where gkj is the kj-component of G. Thus, using the observed value at point ok, it becomes difficult to distinguish between the case of (μj,μj+1)=(1,1) and that of (μj,μj+1)=(1,1).

The above-mentioned experimental results on the distribution of r(αβ) are for a given discretization. Making the cell discretization finer is the same as reducing the size (the length of each side) of each cell. Let the size of cells, the interval between every pair of observation points, and the value of d(O,M) be smaller at the same ratio; we denote this ratio by s. Then, the shape of the histogram of the distribution of {r(αβ)} does not change. Therefore, the shape of the histogram when the size of cells and the interval between the observation points are both multiplied by s without changing d(O,M) is the same as that of the histogram when only d(O,M) is multiplied by 1/s. For example, we can consider the histogram f of Figure to be the histogram for the case where d(O,M)=1, the size of cells is 1/100, and the interval between the observation points is 1/100. This case is the 1/100 subdivision of the case b of Figure (for a magnet sample with 1/1003 size). Similarly, we can consider cases c, d and e in Figure to be the 1/10, 1/20 and 1/30 subdivisions of case b, respectively. Thus we have the following experimental claim.

Experimental claim 4.3

The inverse problem (Equation6) is unstable for cell subdivision. Therefore, assumption (vi) in Section 2.2 should hold with sufficient accuracy for some cell discretization. In other words, only such magnets can be considered.

This claim is compatible with the property that the solution ρ of integral Equation (Equation2) is generally not unique for given data Fz.

Procedure 3.4 needs Assumption 3.5, and from Lemma 3.8, Assumption 3.5 is related to r[](αβ) defined by (Equation21) and r[] defined by (Equation22). If L=[] in (Equation26), we have r(αβ)r[](αβ). Some further experimental consideration for r[](αβ) is described in Section 4.3.

4.3. Properties of the pieces of M defined in Section 2.3

We consider some properties of the pieces {M[]} of M defined in Section 2.3 for the case where nx=ny=20 (Tables  and ) or nx=ny=32 (Tables  and ), m¯=nz=10, no=nxny and d(O,M)=1. We numerically calculated the minimum and average of {r[](αβ)} (defined by (Equation21)), the minimum and maximum of {GRμ[]}, and the minimum and average of {GRμ[]O/GRμ[+1]O}, where μ[]=μ[+1] as nxny-dimensional vectors.

Table 5. Minimum and average of {1000r[](αβ)} (nx=ny=20).

Table 6. Minimum and average of {1000r[](αβ)} (nx=ny=32).

Table 7. Minimum and maximum of {1000GRμ[]O} (nx=ny=20).

Table 8. Minimum and maximum of {1000GRμ[]O} (nx=ny=32).

Table 9. Minimum and average of {GRμ[]O/GRμ[+1]O:μ[]=μ[+1]} (nx=ny=20).

Table 10. Minimum and average of {GRμ[]O/GRμ[+1]O:μ[]=μ[+1]} (nx=ny=32).

First, for each ℓ (=1,2,,10), we randomly selected 1000 magnetic moment vectors, and we calculated r[](αβ) for all (499,500 possible) pairs of those moment vectors as in the case of Figure . The experimental results of the minimum and average of {r[](αβ)} are shown in Tables  and , where the values are multiplied by 1000 and the fourth digit from the top of each value is rounded off. In the cases of this section, the results of Tables  and  numerically imply r[1]>r[2]>>r[10]>0, where r[]=min{r[](αβ)}. Therefore, from Lemma 3.8 (i) and (Equation22), there exist T[]1 (=1,,10) of Assumption 3.5 (i) and inequalities (Equation24) satisfy for =1,,10. However, if the true value of r[] is smaller than the calculation error, then T[]1 may not be obtained by any method.

Next, for each ℓ (=1,2,,10), we randomly selected 10,000 magnetic moment vectors, and we calculated the minimum and maximum of {GRμ[]O}. The experimental results are shown in Tables  and , where the values are multiplied by 1000 and the fourth digit from the top of each value is rounded off. Tables  and  numerically illustrate that piece monotonicity holds for (,+1)=(1,2),(2,3),(3,4) when nx=ny=20 and for (,+1)=(1,2),,(4,5) when nx=ny=32.

When x=y=0, the Green function G(r) of (Equation3) becomes G(r)=6/z4. Therefore, roughly speaking, the values in Tables  decrease proportionally to 4.

Next, for each ℓ (=1,2,,10), we randomly selected 10,000 magnetic moment vectors {μ}, where μ[]=μ[+1] (=1,2,,9) as nxny-dimensional vectors, and we calculated the minimum and average of {GRμ[]O/GRμ[+1]O}. The experimental results are shown in Tables  and , where the fourth digit from the top of each value is rounded off. Tables  and  numerically illustrate that weak piece monotonicity holds for (,+1)=(1,2),,(9,10) in the cases considered in this subsection.

Thus, we have the following.

Experimental claim 4.4

As the value of ℓ increases, all the values in Tables  decrease. These tables show numerically that the above-mentioned selection of M[] approximately satisfies the properties in step 1 of Procedure 3.9. Therefore, we consider the selection is appropriate.

This will be correct if we somewhat change the values of nx, ny and nz.

4.4. Conclusion of this section

Based on the experimental results of this section, it is possible that Ker GR~{0} and (GR)1 exist. Then, (GR)1 is not linear, and it is difficult to solve our inverse problem using a linear algebraic approach. In the following sections, we aim to estimate the magnetic moments as much as possible using our sequential estimation-removal method. In Sections 5 and 6, we utilize assumption (i)–(vii) of Section 2.2. However, the results of this section indicate that the performances of N[]T[]1 (Section 5) and the iterative method (Section 6) decrease as ℓ increases. Furthermore, the results of our numerical experiments described in Sections 5 and 6 are compatible with that. One of the reasons is the performance limit of our personal computer (PC); however, it is impossible to estimate magnetic moments at deep positions accurately using any method. In general situations where assumptions (vi) and (vii) are not satisfied, the problem becomes even more difficult.

5. Deep learning and experimental results

To solve the discretized (nonlinear) binary inverse problem (Equation6), we consider two numerical methods using Procedure 3.4 with Procedure 3.9. (As described at the end of Section 3.2, we used Procedure 3.4 in the following experiments.) One method is to construct a neural network, such as N[]T[]1, while the other method is to employ an iterative algorithm as an approximate T[]1. We study the former method in this section and the latter method in the next section. Since f can be easily calculated for any given μ using (Equation6), we can obtain numerous training data. Deep learning of a neural network requires a high computational cost. However, when such a neural network is obtained, it can be applied with a lower computational cost than that of an iterative method. In this and the next sections, we use assumption (i)–(vii) provided in Section 2.2 and the following setting of experiments.

Setting 5.1 As shown in Figure , all the observation points are placed on the same horizontal plane, and the number of observation points is equal to the total number of cells in the horizontal direction, that is, no=nxny.

This setting is based on the results of Section 4.1. In the following experiments, the sizes of nx, ny, nz, and data are attributed to the performance limit of our PC, especially its memory size (see Table ).

Table 11. Execution environment.

5.1. Structure of neural networks

Based on Setting 5.1, ‘the number of nonzero components of μ[]=nxny=no=dimRno for every ℓ. Therefore, in a neural network N[]T[]1:RnoM[], the number of input nodes and that of output nodes are both equal to n0. We consider our discretized inverse problem to be a type of image classification problem and construct each N[] (=1,2,,nz) using a convolutional autoencoder provided by Keras/Tensorflow (see [Citation14]). (In our numerical experiments, we did not assume any structure for the magnetic moment configurations of the training data; however, if it had some geometrical structure, a convolutional auto encoder would be more effective. Furthermore, if we could construct a neural network suitable for such a geometrical structure, it would be even more effective.)

In our numerical experiments, we set nx=ny=32, nz=10 and no=nxny=1024. Then, the structure of each N[] is shown in Figure . This figure was drawn by the plot_model function of the keras.utils module. In our experiments, the values of parameters are as follows:

  • Size of the kernel: 3×3 in all layers;

  • Strides: 1×1 in conv2d_4 and 2×2 in the other layers;

  • Padding: 0 padding in all layers;

  • The number of filters, as depicted in Figure , that is, 256.

Figure 7. Convolutional auto encoder.

Figure 7. Convolutional auto encoder.

The activation function is the linear function (the identity function) in the output layer and the rectified linear function in the other layers. Each input node corresponds to the observation value at each observation point. Each output node corresponds to the magnetic moment of each cell. In the training, the loss function is the mean squared error function (Equation25) (see [Citation15]). In the tests, it is considered that the value of the magnetic moment is 1 if the output value is greater than 0, and 1 if it is less than 0.

Remark 5.2

We consider {f[]} for =1,2,,nz, where f[]=T(μ[]) and {f[]} denote the set of all possible f[]. Note that the maximum possible number of the elements of {f[]} equals the number of all possible μ[], i.e. 2no=2nxny for every =1,2,,nz. For matrix G defined in Section 2.2, we put G[]:=(hkj), where the size of G[] is the same as the size of G and hkj:=gkjif  j=a(nz+1)+b, a=0,1,,nxny1, b=,+1,,nz+10otherwise.As described in Section 4.1, the ‘practical’ rank of G is smaller than nρ=nxny(nz+1), and the ‘practical’ rank of G[] decreases rapidly as ℓ increases. Therefore, the ‘practical’ dimension of the space spanned by {f[]} decreases as ℓ increases. In general, if A is a d-dimensional hyperplane arrangement that consists of m hyperplanes in general position, the number of regions of A is given by i=0dmi(see Proposition 2.4 of [Citation16]). Therefore, when using hyperplanes to distinguish {f[]}, it is almost certain that more hyperplanes will be required as ℓ increases. In other words, the size of neural network N[] almost certainly have to increase as ℓ increases (cf. [Citation13]). However, we used N[] depicted in Figure for every ℓ because of the performance limit of our PC.

5.2. Deep learning

In our numerical experiments, unless otherwise noted, we set nx=ny=32, m¯=nz=10, no=nxny=1024, the length of each side of every cubic cell of the (discretized) magnetic sample M to 1, and d(O,M) to 1. The number of elements of M was 2nμ=210240, and we randomly chose elements {μ} from M to make training data {(μ,GRμ)}. Such a moment vector μ can be obtained as follows.

  1. Set μ=[μ1,μ2,,μnμ]t=[1,1,,1]t.

  2. i1.

  3. Change μi to 1 with probability r. (We call r the reversal rate.)

  4. ii+1 and go back to step 3.

Making a neural network is more difficult, as the problem is more complex. Therefore, in our numerical experiment, we set r to be a fixed constant instead of a variable. Then, neural networks with several reversal rates may be used to estimate magnetization, and the estimation result closest to the observed data may be adopted. (In the actual experiments, we can apply a vibrating-sample magnetometer (VSM) to measure the magnitude of the overall magnetic moment of the sample, that is, to measure the reversal rate.) We consider the simple and fundamental case where it is randomly determined which cell moments are reversed. (In the ideal high-performance magnet, the orientation of the moments of the magnetic grains are approximately independent of each other.) Therefore, in our numerical experiment, we made the reversal rate common to all μ. Its value was 0.1, 0.3 or 0.5 (we considered these three cases); we made training data comprising 8000 pairs for each reversal rate. In deep learning, we employed Adam to update weights, where the values of the parameters of Adam were the default values given in [Citation17] (cf. [Citation18]).

In the noiseless cases, we expected m¯¯=m¯=nz based on the experimental analysis in Section 4 (see Remark 5.3). Let Ns:=(N[1]s,N[2]s,,N[nz]s) be the obtained neural network by using the training data of reversal rate s, where s=0.1,0.3, and 0.5. In our experiments, to examine the performances of these neural networks, we randomly generated a set of test data {(μ,GRμ)} consisting of 2000 pairs with fixed reversal rates r = 0.1, 0.2, 0.3, 0.4 and 0.5.

Let R be a set of some reversal rates (in our experiments, R={0.1,0.3,0.5}) and assume that we made Ns (sR). When an observation values vector f is given, we calculate Ef(s):==1nzfk=1GRN[k]s(f)2(sR),and put rf:=argminsREf(s).Then, we consider that Nrf is the most appropriate neural network in {Ns:sR} for estimating μ such that GRμ (+δ)=f for the given f.

The execution environment of our experiments is presented in Table .

5.3. Experimental results of the noiseless cases

The relationship between the theoretical consideration in Section 3, the experimental analysis in Section 4, and the following experimental results is described in Remark 5.3.

For the generated noiseless test data {(μ,f)}={(μ,GRμ)} with the reversal rates r=0.1,0.2,0.3,0.4 and 0.5, the rates defined by r:=argmins{0.1,0.3,0.5}fEf(s)are summarized in Table , respectively.

Table 12. Reversal rates.

The estimation performances of Ns (s=0.1,0.3, and 0.5) are presented in Table . There were 32×32 cells in each piece, and the set of test data {(μ,GRμ)} consisted of 2000 pairs. Therefore, 32×32×2000=2,048,000 magnetic moments should be estimated. In Table , rate r is the reversal rate of the test data and rate s is the reversal rate of Ns. 0.1, 0.3, and 0.5 means that s is r of Table . The values in Table  illustrate what percentage of these 2,048,000 moments were correctly estimated. These values are rounded off to three decimal places. Therefore, 1.000 means a value of 0.9995 or more. Table  shows that, up to the 5th piece, the estimations were almost accurate if s=r.

Table 13. Estimation performances of each rate.

Further, we conducted experiments in the cases of d(O,M)=0.1,0.2,, and 0.9. The experimental results when r=r=0.5 are illustrated in Table . In the cases of d(O,M)=0.1,0.2,0.3, and 0.4, the training of N[1]=N[1]0.5 did not perform well. We consider the reason is as follows. Let ok=(xk,yk,zk) (k=1,2,,no=1024) be the observation points and pj=(xj,yj,zj) (j=1,2,,no=1024) be the magnet charge points on the top surface of the magnet sample described in Section 2.2. If xk=xj and yk=yj, then d(O,M)=zkzj. In this case, we have G(okpj)=6/(zkzj)4. Therefore, if d(O,M) is too small, the numerical calculation for estimating the top peace will be unstable. Consequently, in the cases of d(O,M)=0.1,0.2,0.3, and 0.4, the performance of N[1] will be considerably poor. The estimation of the top piece fails, then the estimation of the second piece and below also fails. In such a case, it may be appropriate to change the unit of length.

Table 14. Estimation performances of each d(O,M).

In the case of d(O,M)=0.5, the estimation performance was the highest, and the estimation performance decreased as the value of d(O,M) increased. However, even if d(O,M)=0.5, the estimation performance of the pieces below the 6th piece was low.

As described in Section 4.2, it is more difficult to estimate magnetic moments at deeper positions. To estimate such difficult-to-identify moments, a sufficiently large neural network and enough training data will be required. Owing to the limitations of our execution environment, we cannot enlarge the neural networks or increase the training data beyond in the above-mentioned experiments. However, in connection with this, we conducted the following experiment: the case of nx=ny=20, nz=10, no=nxny, d(O,M)=1, and the number of filters Nf were 256 or 512. In the case of Nf=512, the degree of freedom of the neural network is higher than that of the above-mentioned experiments. Both the reversal rates of the training data (8000 pairs) and the test data (2000 pairs) were 0.5. In this case, since the size of the neural networks is smaller than that in the above-mentioned experiments, we can consider that there are relatively more data. The experimental results are presented in Table . The experimental results of the 32×32 case are also listed again for reference, and these three results are similar. As described in Remarks 5.2 and 5.3, a larger neural network may be required as magnetic charges are at deeper positions.

Table 15. Estimation performance for the size of the problem and the number of filters.

In our sequential estimation-removal method, the errors of the estimates that are made at the upper pieces are transmitted to the lower pieces. Thus the neural networks N[] for the lower pieces cannot perform well. This is unavoidable, but if no error occurs in the upper pieces, then the estimation results by using N[] are better. In the case of nx=ny=32, nz=10, no=nxny, and d(O,M)=1, the estimation performance of N[]r is shown in Table  based on the assumption that no error occurred in the upper pieces. In these experiments, the reversal rates were common to the training data and test data (r=r=0.1,0.3,0.5).

Table 16. Estimation performance when no error occurs in the upper pieces.

Remark 5.3

According to the experimental results in Tables  and  in Section 4.3, it is expected that r[](αβ)>0 holds for all possible pairs (α,β) and =1,,10. We assume this is true; then, it is theoretically possible to estimate up to the piece of =10. However, the performance of the neural networks in our experiment are insufficient to estimate the pieces after =6. It is known that an arbitrary function can be approximated with an arbitrary accuracy using a sufficiently large neural network. Therefore, if sufficiently large neural networks are used, it is expected to be able to estimate the pieces of =6,,10. However, the authors do not know how large neural networks should be used. Furthermore, even if the minimum value r[] is positive, T[]1 may not be obtained if r[] is smaller than the calculation error.

5.4. Experimental results on cases with noise

In this section, nx=ny=32, nz=10, no=nxny=1024, d(O,M)=1, r=0.5, and Nf=256. We used neural networks of r=0.5. Table  presents the experimental results for cases where the data contain noise. The description of the experimental results for 6–10 pieces are omitted. Let f=(f1,,fno) be a noiseless observation values vector, f=GRμ, and set f¯:=i=1no|fi|/no. Let ui denote a uniform random number in the interval [0.5,0.5]. Then, in Table , the noised value f~i was made by f~i=fi+νf¯ui,where ν=0.001 and 0.01. (The actual MFM measurement error is typically estimated to be within 1 %.) As can be seen from this table, the estimation is extremely difficult when there is noise. We discuss how to avoid this difficulty somewhat in this section and Section 7.

Table 17. Estimation performance for cases with noise.

We roughly estimate the size of ηΔ[], which satisfies inequality (Equation23). When μ[](α)μ[](β), the right-hand side of (Equation23) is minimized if there is only one difference between the components of μ[](α) and the components of μ[](β). Then, since the right-hand side of (Equation23) equals (1(1))2=2, (Equation23) becomes (27) supηΔ[]ηO<r[].(27) From Lemma 3.8 (ii), if (Equation27) holds, then Assumption 3.5 (ii) is satisfied.

In the above numerical experiment, a noise η=νf¯u of up to 0.0005 times or 0.005 times the average of the components of f were added to f=GRμ. Therefore, (28) 1000ηO0.5GRμOor5GRμO.(28) Assume μ[1],,μ[1] were estimated correctly. Then, for (Equation27) to hold, the value of the right-hand side of (Equation28) must be approximately smaller than the value of 1000r[]. The minimum of 1000r[](αβ) in Table  roughly approximates to 1000r[]. From Table , GRμO200. Therefore, 1000r[1]>5GRμO>1000r[2]>0.5GRμO>1000r[3].In the case where the size of noise is 0.01, (Equation27) holds only for =1, and in the case where the size of noise is 0.001, (Equation27) holds only for =1,2. For ℓ other than these, unique identification is not theoretically guaranteed. However, Lemma 3.8 (ii) gives only a sufficient condition, and from Table , it seems that the moments of a few more pieces can be identified uniquely.

6. Iterative method and experimental results

In this section, we apply an iterative method to approximate T[]1 of Procedure 3.4. This iterative method is a simple version of Murray–Ng's global smoothing algorithm proposed by [Citation19].

6.1. A simple version of Murray–Ng's global smoothing algorithm

For a given data f, define E(μ):=GRμf2.Then, our inverse problem is as follows:

Problem 6.1

Minimize E(μ) subject to μ{1,1}nμ.

A minimizer of E restricted to {1,1}nμ is called a binary minimizer. Linear map GR can be naturally extended to a map on [1,1]nμ. We use this extension in our calculations to obtain a binary minimizer. This approach is called a continuation or continuous approach (see [Citation20,Citation21]). For an algorithm that can perform this, we utilize a simple version of the global smoothing algorithm (GSA) proposed by [Citation19] and [Citation21]. The GSA is an excellent Newton-type iteration algorithm that can be employed for binary optimization problems based on the continuation or continuous approach.

For the extended objective function E on [1,1]nμ, we define (29) Eη,γ(μ):=E(μ)ηj=1nμlog(1+μj)+log(1μj)+γj=1nμ1+μj1μj,(29) where η and γ are positive real values. In this iterative algorithm, the starting point μ=r0 of the iteration must be selected from (1,1)nμ, and first, we consider large η and small γ. Then, we calculate an approximate minimizer in [1,1]nμ of Eη,γ using an iteration. Next, we make η slightly smaller (ηθηη) and γ slightly larger (γθγγ), and using the obtained approximate minimizer as the new starting point, we calculate an approximate minimizer of Eη,γ again. The same calculation is repeated while making η smaller and γ larger. In this iteration, the second term of (Equation29) locks μ in [1,1]nμ and the third term of (Equation29) brings μ closer to {1,1}nμ.

In the kth step of the iteration of GSA to obtain an approximate minimizer of Eη,γ, we must solve the equation 2Eη,γ(μk)pk=Eη,γ(μk)to obtain a vector pk as a search direction. To solve this equation, we adopt the minor iteration of [Citation22], p.194, as the conjugate gradient algorithm. It is simpler than Algorithm 5.1 of [Citation21].

6.2. Experimental results

As in Section 5, we utilize assumption (i)–(vii) in Section 2.2 and Setting 5.1. In our numerical experiments, we set d(O,M)=1 and (a) nx=ny=20, nz=10, no=nxny=400, (b) nx=ny=32, nz=10, no=nxny=1024, and the data were noiseless. We conducted (a) 30 experiments, (b) 10 experiments, for each of the following reversal rates: 0.1, 0.3 and 0.5; the experimental results are shown in Tables  and . The values in these tables illustrate what percentage of (a) 20×20×30 magnetic moments, (b) 32×32×10 magnetic moments were correctly estimated.

Table 18. Estimation performance of the iteration method (nx=ny=20).

Table 19. Estimation performance of the iteration method (nx=ny=32).

In our numerical experiments, the parameters of GSA utilized are as follows: ϵη=0.01, N=50, θη=0.95, θγ=1/θη, η0=10, γ0=102E(0)/nμ, and ϵF=0.001. (The symbols other than η are the same as those in [Citation19], and η corresponds to μ in [Citation19].) The starting point was r0=0=(0,0,,0). These values of θη, θγ, and r0 were the same as those in [Citation19] or 6.2.4 of [Citation21] (cf. Table 5.1 of [Citation21]), and the values of the other parameters were determined based on experiments for several small magnets. We compare the experimental results listed in Table  with those listed in Table . According to the results in Tables  and , the value of s that minimizes the value of Ef(s) for the observed value f is probably r. Therefore, in comparison, we use N[k]r in Table . Then, the results in Table  are not better than the results in Table . However, the performance of GSA considerably depends on the values of the parameters.

7. Results and discussion

To solve the discretized magnetization-distribution identification problem, we used the neural networks and an iterative method with our sequential estimation-removal method. The results of numerical experiments showed that the discretized magnetization can be roughly estimated up to a certain depth by applying our method as described in Sections 5 and 6. We aimed to estimate the magnetic moments as much as possible, and therefore, the difficulty of estimating this ill-posed problem was highlighted, as described in Sections 46. In our experiments, we used randomly generated artificial training data, and we did not apply any regularization or any Bayesian prior distribution. To obtain higher accuracy estimates from actual (noised) data, training data that approximates the actual magnetic moment configurations should be used; further, some regularization or some Bayesian prior distribution that represents the actual magnetic moment configurations should be applied. To perform these, we must consider the following.

  1. It is important to have a deep knowledge of real magnets.

  2. Based on Basic idea 3.2, we divided the target magnet virtually and consequently localized the problem. On the other hand, regularization/Bayesian prior distribution and realistic magnet data are related to the global structure of the target magnets. We must combine these two perspectives.

These are future issues for us. However, we expect that our methods and ideas described in Sections 2 and 3 can be applied to solve our magnetization-distribution identification problem and some other inverse problems of the form (Equation1), e.g. the inverse gravimetric problem (gravity prospecting problem). This problem is to determine the location and shape of subterranean bodies from gravity measurements at the earth's surface, and in general, it is an ill-posed inverse problem (cf. [Citation23,Citation24]).

Here, we describe a two-dimensional model of an inverse gravimetric problem. Let the x-axis be the horizontal axis, the z-axis be the vertical axis, and the line z = 0 be the ground surface. Let a,b, and c be real numbers, where a<b and c<0, and put D=[a,b]×[c,0]. Let ρ(x,z) ((x,z)D) denote the distribution of mass. Then, the gravitational force F(x,z) (z0) produced by the mass in D is given by F(x,z)=Dγ(zζ)(xξ)2+(zζ)23/2ρ(ξ,ζ)dξdζ,where γ represents the gravitational constant. An inverse gravimetric problem is to estimate ρ(x,z) from observation data {F(x,z)} in an observation area E=[a,b]×[c,d], where aa, bb and 0c, under some appropriate conditions. Assume that D comprises cubic cells like as Figure and ρ(x,z) is constant in each cell. Then, it is easily shown that this inverse problem satisfies weak piece monotonicity. We expect that our methods can be applied to solve this problem.

Acknowledgments

The authors are most grateful to Professor Hitoshi Saito (Akita University) for his valuable discussion and advice. The authors also appreciate the helpful comments from the anonymous reviewers.

Data availability

All the data sets used in this study were randomly generated.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • Leyva-Cruz JA, Ferreira ES, Miltão MSR, et al. Reconstruction of magnetic source images using the Wiener filter and a multichannel magnetic imaging system. Rev Sci Instrum. 2014;85:074701.
  • Mallinson J. On the properties of two-dimensional dipoles and magnetized bodies. IEEE Trans Magn. 1981;17:2453–2460.
  • Vellekoop B, Abelmann L, Porthun S, et al. On the determination of the internal magnetic structure by magnetic force microscopy. J Magn Magn Mater. 1998;190:148–151.
  • Saito H, Chen J, Ishio S. Description of magnetic force microscopy by three-dimensional tip green's function for sample magnetic charges. J Magn Magn Mater. 1999;191:153–161.
  • Cao Y, Kumar P, Zhao Y, et al. Active magnetic force microscopy of Sr-ferrite magnet by stimulating magnetization under an AC magnetic field: direct observation of reversible and irreversible magnetization processes. Appl Phys Lett. 2018;112:223103.
  • Cao Y, Zhao Y, Kumar P, et al. Magnetic domain imaging of a very rough fractured surface of Sr ferrite magnet without topographic crosstalk by alternating magnetic force microscopy with a sensitive FeCo–GdOx superparamagnetic tip. J Appl Phys. 2018;123:224503.
  • Saito H, Ikeya H, Egawa G, et al. Magnetic force microscopy of alternating magnetic field gradient by frequency modulation of tip oscillation. J Appl Phys. 2009;105:07D524.
  • Chang T, Lagerquist M, Zhu JG, et al. Deconvolution of magnetic force images by Fourier analysis. IEEE Trans Magn. 1992;28:3138–3140.
  • Hsu CC, Miller CT, Indeck RS, et al. Magnetization estimation from MFM images. IEEE Trans Magn. 2002;38:2444–2446.
  • Madabhushi R, Gomez RD, Burke ER, et al. Magnetic biasing and MFM image reconstruction. IEEE Trans Magn. 1996;32:4147–4149.
  • Rawlings C, Durkan C. The inverse problem in magnetic force microscopy (MFM) – inferring sample magnetization from MFM images. Nanotechnology. 2013;24:305705.
  • Zhao T, Fujiwara H, Mankey J, et al. Reconstruction of in-plane magnetization distributions from magnetic force microscope images. J Appl Phys. 2001;89:7230–7232.
  • Montúfar G, Pascanu R, Cho K, et al. On the number of linear regions of deep neural networks. In: Ghahramani, Z, Welling, M, Cortes, C, editors. NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol 2; MIT Press: Cambridge, USA; 2014. p. 2924–2932.
  • Keras Google group. Keras Documentation. https://keras.io/ (accessed on 11 December 2019).
  • Keras Google group. Keras losses source. https://github.com/keras-team/keras/blob/master/keras/losses.py (accessed on 9 October 2018).
  • Stanley RP. An Introduction to Hyperplane Arrangements. 2006. https://www.cis.upenn.edu/∼cis610/sp06stanley.pdf (accessed on 28 October 2019).
  • Keras Google group. Keras optimizer source. https://github.com/keras-team/keras/blob/master/keras/optimizers.py#L392 (accessed on 9 October 2018).
  • Kingma DP, Ba JA. A method for stochastic optimizer. Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015); San Diego, USA, 2015; pp. 1–15.
  • Murray W, Ng KM. An algorithm for nonlinear optimization problems with binary variables. Comput Optim Appl. 2010;47:257–288.
  • Mitchell JE, Pardalos PM, Resende GC. Interior point methods for combinatorial optimization. In: Du DZ, Pardalos PM, editors. Handbook of Combinatorial Optimization, Vol 1; Kluwer Academic Publishers: Dordrecht, Netherlands; 1999. p. 189–298.
  • Ng KM. A continuation approach for solving nonlinear optimization problems with discrete variables [PhD thesis]. Stanford: Dept. Management Science and Engineering of Stanford Univ., 2002.
  • Dembo RS, Steihaug T. Truncated-Newton algorithms for large-scale unconstrained optimization. Math Program. 1983;26:190–212.
  • Groetsch CW. Inverse problems in the mathematical sciences. Braunschweig-Wiesbaden: Friedr. Vieweg & Sohn; 1993.
  • Sampietro D and Sansò F. Uniqueness theorems for inverse gravimetric problems. In: Sneeuw N, et al., editors. VII Hotine-Marussi Symposium on Mathematical Geodesy, International Association of Geodesy Symposia 137. Springer-Verlag: Berlin, Heidelberg; 2012. p. 111–115.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.