631

Views

CrossRef citations to date

Altmetric

Listen

Research Article

Estimation method for inverse problems with linear forward operator and its application to magnetization estimation from magnetic force microscopy images using deep learning

Hajime KawakamiMathematical Science Course, Akita University, Akita, JapanCorrespondence[email protected]

https://orcid.org/0000-0003-1815-5521

Hajime KudoMathematical Science Course, Akita University, Akita, Japan

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

This study considers an inverse problem, where the corresponding forward problem is given by a finite-dimensional linear operator T. The inverse problem has the following form: $(data) = T (unknown) .$ It is assumed that the number of patterns that the unknown quantity can take is finite. Then, even if $Ker T \neq {0},$ the unknown quantity may be uniquely determined from the data. This case is the subject of this study. We propose a method for solving this inverse problem using numerical calculations. A famous inverse problem requires the estimation of the unknown magnetization distribution or magnetic charge distribution in an anisotropic permanent magnet sample from the magnetic force microscopy images. It is known that the solution of this problem is not unique in general. In this work, we consider the case where a magnetic sample comprises cubic cells, and the unknown magnetic moment is oriented either upward or downward in each cell. This discretized problem is an example of the above-mentioned inverse problem: $(data) = T (unknown) .$ Numerical calculations were carried out to solve this model problem employing our method and deep learning. The experimental results show that the magnetization can be estimated roughly up to a certain depth.

Keywords:

2010 Mathematics Subject Classifications:

1. Introduction

Let $m, n \in N$ and $\tilde{T} : R^{m} \to R^{n}$ be a linear operator. For a finite set $M \subset R^{m},$ we consider the natural restriction of $\tilde{T}$ defined on $M$ as $T = \tilde{T} |_{M} : M \to R^{n} .$ In this study, we consider the inverse problems of the form: (1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) where $δ \in R^{n}$ denotes some noise and $Ker \tilde{T} = {0}$ may not hold. Since $M$ is a finite set, the map $T : M \to R^{n}$ may be injective even if $Ker \tilde{T} \neq {0} .$ Then, $T^{- 1}$ is often nonlinear.

The inverse problem of obtaining the spatial distributions of magnetic properties inside an object from experimental measurements has been investigated by numerous researchers (see [Citation1]). In this study, we consider such a magnetic inverse problem, that is, the magnetization-distribution identification problem represented by an integral equation (see (Equation2(2) $F_{z}^{'} (r) = (G ρ) (r) := \int_{M} G (r - r^{'}) ρ (r^{'}) d r^{'},$ (2) )). This problem requires the estimation of the unknown magnetization distribution or magnetic charge distribution in an anisotropic permanent magnet sample from its magnetic force microscopy (MFM) images. This integral equation is a Fredholm equation of the first kind, and its solution is generally not unique for the given data. In fact, in principle, it is impossible to determine the magnetic charge distribution inside a magnetic material from the MFM images (see [Citation2,Citation3]).

In this work, we assume that an anisotropic magnetic sample is composed of magnetic grains, where each grain has single magnetic domain structure and it is magnetically isolated from the neighbouring magnetic grains by non-magnetic boundary phase. Therefore, no magnetic domain walls exist in the sample. Further, we assume that the magnetic sample comprises cubic cells, and every magnetic grain is composed of one cell or a set of several cells. For example, an ideal hot-rolled magnet sample approximately has such a structure. We consider the following case. First, we apply a strong upward magnetic field to the magnetic sample to make the magnetized state, that is, to make the magnetic moments of all cells of this sample upward. Next, we apply a downward magnetic field to the sample. Then, the orientation of the magnetic moment of each cell either remains upwards or changes downwards. (When a magnetic grain is composed of several cells, the moments of those cells are oriented in the same direction.) To investigate the magnetic performance of the sample, our aim is to know at what positions inside the sample the magnetic moments are reversed from its MFM images as many as possible.

Since the magnet sample is assumed to consist of cubic cells, we discretize the integral equation and consider this discretized inverse problem as a model problem of (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ). Saito et al. [Citation4] obtained the Green kernel function of the integral equation. Our analysis is based on their formulation. We aim to estimate the magnetization distribution on the surface of the magnet and as deep in the interior as possible. In this study, we propose a method of reconstruction (a sequential estimation-removal method). We conducted computational simulations using this method. The experimental results show that the discretized magnetization can be estimated roughly up to a certain depth. We did not conduct experiments using actual magnets; however, we consider that it is appropriate to employ the high-performance MFM developed by Saito et al. (see [Citation5–7]) when conducting experiments using actual magnets. This MFM only detects the normal component of the magnetic field to a sample surface. In this study, we assume that such an MFM is used.

Various methods have been developed for reconstructing the internal magnetization distribution with some constraints or surface magnetization distribution from the MFM images, e.g. see [Citation8–11], and [Citation12]. However, the authors were unable to identify the research that solved our inverse problem.

The rest of this paper is organized as follows: In Section 2, we describe our magnetization-distribution identification problem and the cell discretization of this problem. This discretized problem is a model problem of the inverse problem (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ). In Section 3, we propose a method for solving the general inverse problem of the form (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ); we call it a sequential estimation-removal method (although it may be known by other names, the authors are unaware of them). This method is based on the idea of earlier estimating a part that can be estimated more accurately (Basic idea 3.2). We divide the target $μ \in M$ virtually into several parts, and each part is called a piece (in the case of our discretized magnetization-distribution identification problem, each piece corresponds to a set of several cells). In our proposed method, we estimate and virtually remove the pieces one-by-one from those that can be estimated more accurately. Consequently, the problem is localized, and the computational cost can be reduced. This method requires the linearity of T. In Section 4, we numerically consider the ill-posedness of the discretized magnetization-distribution identification problem and the original infinite-degrees-of-freedom problem, and the relationship between these problems using the results of our numerical experiments. As described in Sections 4.1 and 4.2, these problems are very ill-posed (cf. [Citation2,Citation3]). In Sections 5 and 6, we describe the results of the numerical experiments, in which we attempted to solve the discretized magnetization-distribution identification problem using our sequential estimation-removal method. This method is implemented by employing (artificial) neural networks as described in Section 5, and by utilizing an iterative algorithm as described in Section 6. These can be applied even if $T^{- 1}$ is nonlinear. In particular, the set of neural networks obtained can act as an approximation of $T^{- 1} .$ The results of our numerical experiments show that the discretized magnetization μ can be roughly estimated up to a certain depth. Moreover, based on our experiments, the estimation of μ from our method is more accurate than estimating μ at one time. Since we aimed to estimate the magnetic moments as much as possible, the difficulty of estimating this ill-posed inverse problem has been highlighted in Sections 5 and 6.

In Sections 4–6, we consider the following simple and basic case: each magnetic grain comprises only one cell and the orientation of the moments of the cells are independent of each other. Thus we used randomly generated artificial training data and test data, and we did not apply any regularization or any Bayesian prior distribution. To obtain higher accuracy estimates in a real situation, training data that approximates the actual magnetic moment configurations should be used, and some regularization or some Bayesian prior distribution representing the actual magnetic moment configurations should be adopted. We discuss these aspects somewhat in Section 7. Further, in Section 7, we describe an inverse gravimetric problem as a problem for which our method may be applicable.

2. Magnetization-distribution identification problem and its discretization

2.1. Problem

We consider the discretized magnetization-distribution identification problem described in Section 1 as a model problem of the inverse problem (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ). This problem is a discretization of the problem of solving integral equation (Equation2(2) $F_{z}^{'} (r) = (G ρ) (r) := \int_{M} G (r - r^{'}) ρ (r^{'}) d r^{'},$ (2) ). This integral-equation formulation is based on [Citation4]. The anisotropic permanent magnet sample is denoted by M. Figure shows the three-dimensional MFM tip Green function $G (r)$ $(r = (x, y, z) \in R^{3})$ for a dipole type MFM tip, and the rectangle is $M \cap {z = 0} .$ Here, we consider the Green function for a point magnetic charge. The magnetization for the dipole type MFM tip is regarded as an infinitesimal magnetic dipole moment, which gives the maximum spatial resolution. In Figure , the z-axis represents the normal direction to the surface of the sample M. Let $F_{z}^{'} (r)$ denote the magnetic force gradient along the z direction at the position r, that is, $F_{z}^{'} (r)$ is an $R$ -valued function corresponding to the signal amplitude of an MFM image at r. Then, $F_{z}^{'} (r)$ is given by the convolution integral of the point magnetic charge distribution $ρ (r)$ with $G (r)$ as (2) $F_{z}^{'} (r) = (G ρ) (r) := \int_{M} G (r - r^{'}) ρ (r^{'}) d r^{'},$ (2) where $r^{'} = (x^{'}, y^{'}, z^{'})$ runs through sample M, $r^{'} \in M,$ and $d r^{'} = d x^{'} d y^{'} d z^{'} .$ Let $m = (m_{x}, m_{y}, m_{z})$ denote the magnetization of the MFM tip. In a real situation, both the magnetization components $m_{x}$ and $m_{y}$ are 0. In the following, we normalize $m_{z} / 4 π μ_{0}$ to 1, where $μ_{0}$ denotes the vacuum permeability. Then, $G (r)$ is represented by (3) $G (r) = \frac{15 z^{3} - 9 z (x^{2} + y^{2} + z^{2})}{{(x^{2} + y^{2} + z^{2})}^{7 / 2}} .$ (3) Note that $G (r - r^{'})$ does not become ∞ because the MFM tip is far from the magnetic sample. Our (infinite-degrees-of-freedom) inverse problem requires determining the unknown distribution ρ from the observation data $F_{z}^{'} .$

Figure 1. MFM.

2.2. Simple discretization

Integral Equation (Equation2(2) $F_{z}^{'} (r) = (G ρ) (r) := \int_{M} G (r - r^{'}) ρ (r^{'}) d r^{'},$ (2) ) is a linear Fredholm equation of the first kind. In general, the solution ρ is not unique for given data $F_{z}^{'}$ (see [Citation2,Citation3]). In this study, as described in Section 1, we consider the following basic and ideal discretized problem of Equation (Equation2(2) $F_{z}^{'} (r) = (G ρ) (r) := \int_{M} G (r - r^{'}) ρ (r^{'}) d r^{'},$ (2) ) as a model problem of (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ).

All magnetic samples are cuboids of the same size (see Remark 2.1).
Every magnetic sample is always on the same horizontal plane, and all the observation points are common to all samples.
The MFM probe tip can be considered as a point magnetic dipole.
Every magnetic sample is divided into small cubic cells as in Figure ; we call it cell discretization. Each cell or a set of several cells is a magnetic grain with single magnetic domain structure. (Although it is false if the boundary of a magnetic grain cuts the cell, we consider that it is limited to this case (see Experimental claim 4.3).) Every magnetic grain is magnetically isolated from the neighbouring magnetic grains by the non-magnetic boundary phase.
The length of each side of every cubic cell is normalized to 1.
The normalized value of the magnetic moment is 1 (upward) or $- 1$ (downward) in each cell. (When a magnetic grain comprises several cells, the moments of those cells are oriented in the same direction.)

Remark 2.1

Further, if we discretize the space around the magnetic sample into cells, we do not need assumption (i). Then, assumption (ii) is ‘the normalized value of the magnetic moment is 1 (upward), $- 1$ (downward), or 0 (space) in each cell’. In this case, the estimation is more difficult than in the case of assumption (i) because the estimation is to select one out of three possibilities. However, it is known which cells are included in the space, that is, which cells have the 0-value moment. The difficulty is mitigated if the information can be used for estimation.

Figure 2. Cell discretization.

We let $n_{μ}, n_{x}, n_{y}$ and $n_{z}$ denote the total number of cells, the number of cells in the x-axis direction, in the y-axis direction and in the z-axis direction, respectively. Therefore, $n_{μ} = n_{x} n_{y} n_{z},$ and the cells can be numbered as $c_{1}, \dots, c_{n_{z}},$ $c_{n_{z} + 1}, \dots, c_{n_{z} + n_{z}}, c_{2 n_{z} + 1}, \dots, c_{2 n_{z} + n_{z}}, \dots, c_{(n_{x} n_{y} - 1) n_{z} + 1}, \dots, c_{(n_{x} n_{y} - 1) n_{z} + n_{z}} . c_{2 n_{z} + 1}, \dots, c_{2 n_{z} + n_{z}}, \dots, c_{(n_{x} n_{y} - 1) n_{z} + 1}, \dots, c_{(n_{x} n_{y} - 1) n_{z} + n_{z}} . c_{2 n_{z} + 1}, \dots, c_{2 n_{z} + n_{z}}, \dots, c_{(n_{x} n_{y} - 1) n_{z} + 1}, \dots, c_{(n_{x} n_{y} - 1) n_{z} + n_{z}}$ . Here, suffix $i = a n_{z} + b$ $(a = 0, \dots, n_{x} n_{y} - 1, b = 1, \dots, n_{z})$ means that the position of the cell is ath in the horizontal (x –y plane) direction and bth in the vertical direction from the top. (The order in the horizontal direction is determined appropriately.)

Example 2.2

We consider the case of $n_{μ} = 6,$ $n_{x} = 2,$ $n_{y} = 1,$ and $n_{z} = 3.$ Then, the numbering of cells is as shown in Figure .

Figure 3. Numbering cells.

Let $μ_{i}$ denote the magnetic moment of cell $c_{i};$ then, $μ_{i} = 1$ (upward) or $μ_{i} = - 1$ (downward). We consider the magnetic charge of each cell, which is determined from its moment. If $μ_{i} = 1,$ then the values of charges are 1 and $- 1$ at the top and bottom faces of cell $c_{i},$ respectively. If $μ_{i} = - 1,$ then the values of charges are $- 1$ and 1 at the top and bottom faces of cell $c_{i},$ respectively. The values of charges are zero inside each cell and at all vertical faces of each cell. In addition to assumption (vi), we assume

(vii)

The values of the charge of the top/bottom face are centred at the centre point of the top/bottom face.

Next, we consider two vertically adjacent cells. The centre points of adjacent faces of these two cells are the same, and the value of the magnetic charge at this point is the sum of the values at the two adjacent centre points. Thus the value is $- 2$ , 0 or 2. Therefore, when the cell-discretized magnetic sample M comprises only two vertically adjacent cells, the possible values of charges are $- 2$ , $- 1$ , 0, 1 and 2. Figure illustrates the magnetic moments (up or down arrows) and charges in the four possible cases. Next, we consider the general case. By identifying overlapping points, the total number of the centre points of the top and bottom faces of all cells is given by $n_{ρ} := n_{x} n_{y} (n_{z} + 1) .$ These points can be numbered as $p_{1}, \dots, p_{n_{z} + 1},$ $p_{(n_{z} + 1) + 1}, \dots, p_{(n_{z} + 1) + (n_{z} + 1)},$ $p_{2 (n_{z} + 1) + 1}, \dots, p_{2 (n_{z} + 1) + (n_{z} + 1)},$ …, $p_{(n_{x} n_{y} - 1) (n_{z} + 1) + 1}, \dots, p_{(n_{x} n_{y} - 1) (n_{z} + 1) + (n_{z} + 1)} .$ Here, suffix $j = a (n_{z} + 1) + b$ of $p_{j}$ $(a = 0, \dots, n_{x} n_{y} - 1, b = 1, \dots, n_{z} + 1)$ means that the position of the charge is ath in the horizontal (x –y plane) direction and bth in the vertical direction from the top. The order in the horizontal direction is the same as that of the cells. Let $ρ_{j}$ denote the magnetic charge at $p_{j} .$ Then, $ρ_{j} = - 2, - 1, 0, 1$ or 2.

Figure 4. Magnetic moments and charges.

Let $n_{o}$ denote the number of the observation points, where we observe the values of $F_{z}^{'}$ at these points. We number these points appropriately as $o_{1}, o_{2}, \dots, o_{n_{o}},$ and we let $f_{k}$ denote the value of $F_{z}^{'} (r)$ at $o_{k} .$ Next, we define (4) $g_{k j} := G (o_{k} - p_{j}) .$ (4) This is a discretization of (Equation3(3) $G (r) = \frac{15 z^{3} - 9 z (x^{2} + y^{2} + z^{2})}{{(x^{2} + y^{2} + z^{2})}^{7 / 2}} .$ (3) ). Put $f := [\begin{matrix} f_{1} \\ f_{2} \\ ⋮ \\ f_{n_{o}} \end{matrix}], G := [\begin{array}{ccc} g_{11} & \dots & g_{1 n_{ρ}} \\ g_{21} & \dots & g_{2 n_{ρ}} \\ ⋮ & ⋮ & ⋮ \\ g_{n_{o} 1} & \dots & g_{n_{o} n_{ρ}} \end{array}], ρ := [\begin{matrix} ρ_{1} \\ ρ_{2} \\ ⋮ \\ ρ_{n_{ρ}} \end{matrix}] .$ Thus, we have a discretized equation of (Equation2(2) $F_{z}^{'} (r) = (G ρ) (r) := \int_{M} G (r - r^{'}) ρ (r^{'}) d r^{'},$ (2) ), (5) $f = G ρ .$ (5)

The relationship between $μ_{i}$ and $ρ_{j}$ is as follows. Put $R := [\begin{array}{cccc} R_{1} & 0 & \dots & 0 \\ 0 & R_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & R_{n_{x} n_{y}} \end{array}], μ := [\begin{matrix} μ_{1} \\ μ_{2} \\ ⋮ \\ μ_{n_{μ}} \end{matrix}],$ where $μ_{i}$ denotes the moment of $c_{i}$ $(i = 1, \dots, n_{μ})$ and $R_{1}, \dots, R_{n_{x} n_{y}}$ are $(n_{z} + 1) \times n_{z}$ matrices such that $R_{1} = \dots = R_{n_{x} n_{y}} = [\begin{array}{ccccccc} 1 & 0 & 0 & \dots & 0 & 0 & 0 \\ - 1 & 1 & 0 & \dots & 0 & 0 & 0 \\ 0 & - 1 & 1 & \dots & 0 & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & - 1 & 1 & 0 \\ 0 & 0 & 0 & \dots & 0 & - 1 & 1 \\ 0 & 0 & 0 & \dots & 0 & 0 & - 1 \end{array}] .$ Then, we have $ρ = R μ .$ Thus our discretized (noiseless) inverse problem is now to solve the following system of linear equations: (6) $G R μ = f .$ (6) This equation is the same as Equation (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ) with $m = n_{μ},$ $n = n_{o},$ $T = G R,$ and $δ = 0.$ Since each $μ_{i}$ is either 1 or $- 1,$ the total number of possible patterns that μ can take is $2^{n_{μ}} .$ Let $M$ be the set of the all patterns, that is, $M = \{μ : each μ_{i} is 1 or - 1\} = {- 1, 1}^{n_{μ}} .$ Then, GR is a map from $M \subset R^{n_{μ}}$ to $R^{n_{o}} .$ We call $μ \in M$ a moment vector. Since $M$ is a finite set, GR may have the inverse map even if $\dim n_{o} < \dim n_{μ} .$

We consider the discretized inverse problem (Equation6(6) $G R μ = f .$ (6) ) under assumption (i)–(vii); however, the original inverse problem is (Equation2(2) $F_{z}^{'} (r) = (G ρ) (r) := \int_{M} G (r - r^{'}) ρ (r^{'}) d r^{'},$ (2) ). In the rest of this subsection, we consider the relationship between these two problems. Let $O \subset R^{3}$ denote an observation area, that is, O is an area containing all the observation points. In Figure , O is a two-dimensional square. Let $∥ \cdot ∥_{O}$ and $∥ \cdot ∥_{M}$ denote appropriate norms of a function defined on O and M, respectively, e.g. they are the $L^{2}$ norms. Then, for Equation (Equation2(2) $F_{z}^{'} (r) = (G ρ) (r) := \int_{M} G (r - r^{'}) ρ (r^{'}) d r^{'},$ (2) ), there exists C > 0 such that $∥ F_{z}^{'} ∥_{O} = ∥ G ρ ∥_{O} \leq C ∥ ρ ∥_{M}$ for every ρ because $G (r - r^{'})$ $(r \in O, r^{'} \in M)$ is bounded on $O \times M .$ Only in the rest of this subsection, we denote the length of each side of every cell by Δ and write $ρ_{Δ}$ as ρ, $G_{Δ}$ as G, and $f_{Δ}$ as f in the discretized Equation (Equation5(5) $f = G ρ .$ (5) ). We can consider that the vectors $f_{Δ}$ and $ρ_{Δ}$ are piecewise constant functions defined on O and M, respectively. (Note that the components of these vectors are labelled with $n_{o}$ points in O and $n_{ρ}$ points in M, respectively.) Then, we have $∥ F_{z}^{'} - G ρ_{Δ} ∥_{O} = ∥ G ρ - G ρ_{Δ} ∥_{O} \leq C ∥ ρ - ρ_{Δ} ∥_{M}$ and $∥ G ρ_{Δ} - f_{Δ} ∥_{O} = ∥ G ρ_{Δ} - G_{Δ} ρ_{Δ} ∥_{O} \leq ∥ G - G_{Δ} ∥ ∥ ρ_{Δ} ∥_{M},$ where $∥ G - G_{Δ} ∥$ is the operator norm of $G - G_{Δ} .$ Note that $∥ G - G_{Δ} ∥ \to 0$ as $Δ \to 0$ because the integral kernel of $G$ is smooth, and there exists $C^{'} > 0$ such that $∥ ρ_{Δ} ∥_{M} < C^{'}$ for every $Δ > 0$ because $| ρ_{j} | \leq 2.$ Therefore, by using $∥ F_{z}^{'} - f_{Δ} ∥_{O} \leq ∥ F_{z}^{'} - G ρ_{Δ} ∥_{O} + ∥ G ρ_{Δ} - f_{Δ} ∥_{O},$ we have a stability property: $f_{Δ} \to F_{z}^{'}$ as $Δ \to 0$ and $ρ_{Δ} \to ρ .$ This stability property ensures the validity of the discretized approximation of the ‘forward’ problem, that is, the problem of determining $F_{z}^{'}$ from ρ. However, our inverse problem does not have such a stability property. We consider this in Section 4.2.

2.3. Selection of pieces of $M$

In Section 1, we introduced the concept of a piece of $M .$ Each piece of $M$ is a subset of $M,$ and the sum of all pieces is $M .$ It is important to select pieces appropriately for problem (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ). The properties that pieces are expected to satisfy are described in Sections 3.2 and 3.3. For the inverse problem (Equation6(6) $G R μ = f .$ (6) ), the following pieces (i.e. subsets) of $M$ can be naturally obtained.

For a moment vector $μ = [μ_{1}, \dots, μ_{n_{μ}}]^{t} \in M$ ( $[\cdot]^{t}$ denotes the transpose) defined in Section 2.2 and $ℓ = 1, 2, \dots, n_{z},$ define $μ_{[ℓ]} = {[μ_{[ℓ], 1}, μ_{[ℓ], 2}, \dots, μ_{[ℓ], n_{μ}}]}^{t} \in R^{n_{μ}}$ by $μ_{[ℓ], i} := \{\begin{cases} μ_{i}, & if i = ℓ, n_{z} + ℓ, 2 n_{z} + ℓ, \dots, (n_{x} n_{y} - 1) n_{z} + ℓ \\ 0, & otherwise. \end{cases}$ Consequently, the ith component of $μ_{[ℓ]}$ is equal to the ith component of μ if the component is the magnetic moment of one of the ℓth cells vertically from the top of M, and otherwise, the ith component of $μ_{[ℓ]}$ is 0. For example, the non-zero components of $μ_{[1]}$ correspond to the cells of the top surface of M. (Furthermore, see Example 2.3.) We define $M_{[ℓ]} := \{μ_{[ℓ]} : μ \in M\} \subset R^{n_{μ}} (ℓ = 1, 2, \dots, n_{z})$ and we call $M_{[ℓ]}$ the ℓth piece of $M .$ In particular, $M_{[1]}$ and $M_{[\bar{m}]}$ are called the top piece and the bottom piece, respectively. Furthermore, $M_{[ℓ]}$ with a small value of ℓ is called an upper piece while $M_{[ℓ]}$ with a large value of ℓ is called a lower piece. Then, we have (7) $μ = \sum_{ℓ = 1}^{n_{z}} μ_{[ℓ]} .$ (7) There exists a natural projection $M = \{μ = \sum_{ℓ = 1}^{n_{z}} μ_{[ℓ]} : μ_{[ℓ]} \in M_{[ℓ]}\} ⟵ ⨁_{ℓ = 1}^{n_{z}} M_{[ℓ]} \subset ⨁_{ℓ = 1}^{n_{z}} R^{n_{μ}},$ where the symbol ⊕ denotes the direct sum of vector spaces or their subsets. However, $M_{[ℓ]} \subset M$ does not hold because μ does not have zero components and $μ_{[ℓ]}$ has zero components.

Example 2.3

We consider the case of Example 2.2. The cells ${c_{1}, c_{2}, \dots, c_{6}}$ corresponding to the elements $[μ_{1}, μ_{2}, \dots, μ_{6}]^{t}$ are as shown in Figure . Then, for $μ = [μ_{1}, μ_{2}, \dots, μ_{6}]^{t},$ we have $μ_{[1]} = [μ_{1}, 0, 0, μ_{4}, 0, 0]^{t},$ $μ_{[2]} = [0, μ_{2}, 0, 0, μ_{5}, 0]^{t},$ and $μ_{[3]} = [0, 0, μ_{3}, 0, 0, μ_{6}]^{t} .$ In this case, the sets of cells corresponding to nonzero components of $M_{[1]},$ $M_{[2]},$ and $M_{[3]}$ are ${c_{1}, c_{4}},$ ${c_{2}, c_{5}},$ and ${c_{3}, c_{6}},$ respectively.

In the following, we often identify a piece with the set of cells corresponding to it.

3. Inverse problem with a linear forward operator and a sequential estimation-removal method

In this section, we consider the general inverse problems of the form (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ).

3.1. Inverse problem with a linear forward operator

Let $m, n,$ T, $\tilde{T}$ and $M \subset R^{m}$ be the integers, operators and set as described in Section 1. In the following, we aim to develop an estimation method for the inverse problems of the form (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ) with $Ker \tilde{T} \neq {0}$ and $δ \neq 0.$ Since $M$ is a finite set, $T : M \to R^{n}$ may be injective even if $Ker \tilde{T} \neq {0} .$ Then, the inverse map $T^{- 1} : T (M) \to M$ exists. However, it is often nonlinear as per the following lemma.

Lemma 3.1

Let L and $L^{'}$ be the linear subspaces spanned by $M$ and $T (M),$ respectively. Assume that $L \cap Ker \tilde{T} \neq {0}$ and $T : M \to T (M)$ is injective. Then, $T^{- 1} : T (M) \to M$ cannot be extended to a linear map $L^{'} \to L .$

Proof.

Since $T : L \to L^{'}$ is linear, we have $\dim L^{'} \leq \dim L .$ If $T^{- 1} : T (M) \to M$ has a linear extension, we have $\dim L^{'} \geq \dim L .$ Therefore, $\dim L^{'} = \dim L .$ Then, $\tilde{T} : L \to L^{'}$ is also injective. Hence, $L \cap Ker \tilde{T} = {0} .$ This contradicts the assumption that $L \cap Ker \tilde{T} \neq {0} .$

Even if T is not injective, there may exist subsets $M_{I}, M_{C}$ of $R^{m}$ such that every $μ \in M$ has a unique decomposition of the form: $μ = μ_{I} + μ_{C}, μ_{I} \in M_{I}, μ_{C} \in M_{C}$ and $μ_{I}$ can be uniquely determined by $T (μ) .$ For simplicity, we denote the map (8) $T (M) ⟶ M_{I}, T (μ) = T (μ_{I}) + T (μ_{C}) ⟼ μ_{I}$ (8) by $T^{- 1} .$ In general, the map $T (μ_{I}) \mapsto μ_{I}$ is also nonlinear. The set of all the vectors of the form of (Equation15(15) $\sum_{ℓ = 1}^{\bar{\bar{m}}} μ_{[ℓ]},$ (15) ) below is an example of $M_{I}$ (if $δ = 0) .$

From Section 4 onward, we consider the (discretized) magnetization-distribution identification problem described in Section 2. It is an example of the above-mentioned inverse problem. In this problem, $M = {- 1, 1}^{m}$ and (practically) $Ker \tilde{T} \neq {0} .$ However, in some cases, the inverse map $T^{- 1}$ exists and it is (practically) nonlinear. (The linear space L spanned by $M$ is $R^{m} .$ Therefore, $L \cap Ker \tilde{T} \neq {0},$ and the nonlinearity of $T^{- 1} : T (M) \to M$ is obtained using Lemma 3.1.) Therefore, it is difficult to solve the inverse problem by employing a linear algebra method. We aim to construct a map N that approximates $T^{- 1},$ $N \approx T^{- 1},$ for example, by machine learning (deep learning). Further, this map N is expected to be as robust as possible to noise.

3.2. A sequential estimation-removal method

Machine learning is often computationally expensive. In the numerical experiments conducted on our magnetic inverse problem, the deep learning for constructing a neural network, $N \approx T^{- 1},$ stopped before the value of the loss function becomes small enough. (One of the reasons we think is described in Section 3.4.) Therefore, we divide the problem as follows. Let $\bar{m} \in N,$ $\bar{m} \leq m$ and $M_{[1]}, M_{[2]}, \dots, M_{[\bar{m}]}$ be subsets of $R^{m}$ such that, every $μ \in M$ has a unique decomposition of the form: (9) $μ = \sum_{ℓ = 1}^{\bar{m}} μ_{[ℓ]} (μ_{[ℓ]} \in M_{[ℓ]}),$ (9) where $\sum$ denotes the sum of vectors in $R^{m}$ (see (Equation7(7) $μ = \sum_{ℓ = 1}^{n_{z}} μ_{[ℓ]} .$ (7) )). As described in Section 2.3, there exists a natural projection $M = \{\sum_{ℓ = 1}^{n_{z}} μ_{[ℓ]} : μ_{[ℓ]} \in M_{[ℓ]}\} ⟵ ⨁_{ℓ = 1}^{\bar{m}} M_{[ℓ]} .$ However, $M_{[ℓ]} \subset M$ may not hold, and we call $M_{[ℓ]}$ the ℓth piece of $M .$

If T is injective, there exists the inverse map N, i.e. $N (T (μ)) = μ .$ Therefore, we can consider maps $N_{[1]}, N_{[2]}, \dots, N_{[\bar{m}]}$ such that $N_{[ℓ]} (T (μ)) = μ_{[ℓ]} .$ Moreover, for $f = T_{δ} (μ) = T (μ) + δ$ with $Ker \tilde{T} \neq {0},$ it may be expected that the following holds: $N_{[ℓ]} (f) = N_{[ℓ]} (T_{δ} (μ)) \approx μ_{[ℓ]} .$ If we have these maps, we can estimate μ as $μ \approx \sum_{ℓ = 1}^{\bar{m}} N_{[ℓ]} (f) .$ This method independently estimates $μ_{[1]}, μ_{[2]}, \dots, μ_{[\bar{m}]}$ and does not require the linearity of T. In the numerical experiments on our ill-posed magnetic inverse problem, this method was superior to the above-mentioned method that estimates μ at once. However, its performance was considerably inadequate even if $δ = 0.$ Therefore, we propose a reconstruction method (sequential estimation-removal method) based on the following idea:

Basic idea 3.2: A piece that can be accurately estimated should be estimated earlier.

In this method, we employ the linearity of T to solve the inverse problem (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ).

Let $∥ \cdot ∥_{O}$ denote a norm on $R^{n} .$ To realize Basic idea 3.2, we expect that the pieces $M_{[1]}, M_{[2]}, \dots, M_{[\bar{m}]}$ are selected such that the value of $∥ T (μ_{[ℓ]}) ∥_{O}$ decreases as the value of ℓ increases. First, we expect that the selection satisfies (10) ${∥T (μ_{[ℓ + 1]})∥}_{O} < {∥T (μ_{[ℓ]})∥}_{O}$ (10) for all $μ \in M$ and $ℓ = 1, \dots, \bar{m} - 1.$ We call this property piece monotonicity. If this property cannot be obtained, we expect the following property holds:

For every $μ \in M$ and every $ℓ = 1, \dots, \bar{m},$ it holds that $\dim μ_{[ℓ]} = m / \bar{m} .$ (In our magnetic inverse problem, this implies that every piece corresponds to the same number of cells.)
(Equation10(10) ${∥T (μ_{[ℓ + 1]})∥}_{O} < {∥T (μ_{[ℓ]})∥}_{O}$ (10) ) holds if $μ_{[ℓ]} = μ_{[ℓ + 1]}$ as $m / \bar{m}$ -dimensional vectors, where the order of the elements is properly determined (see Example 3.3).

We call this property weak piece monotonicity. As described in Section 2.3, we call $M_{[ℓ]}$ with a small value of ℓ an upper piece while $M_{[ℓ]}$ with a large value of ℓ a lower piece. In particular, $M_{[1]}$ and $M_{[\bar{m}]}$ are called the top piece and the bottom piece, respectively. To realize Basic idea 3.2, we expect that a piece closer to the top piece can be estimated more accurately. In Section 4.3, it is experimentally shown that the pieces obtained in Section 2.3 for our magnetic inverse problem satisfy piece monotonicity for $ℓ = 1, \dots, m^{'}$ with some $m^{'}$ $\leq \bar{m} = 10$ (see Table ) and weak piece monotonicity for $ℓ = 1, \dots, \bar{m} = 10$ (see Table ). In Assumption 3.5, we will describe other properties that pieces are expected to satisfy, and in Section 3.3, we will describe some conditions for Assumption 3.5 to hold.

Example 3.3

We consider the case of the discretized magnetization-distribution identification problem described in Section 2. For example, in the same case of $m = n_{μ} = 6$ and $\bar{m} = n_{z} = 3$ as in Examples 2.2 and 2.3, we consider $μ_{[1]} = [1, 0, 0, - 1, 0, 0]^{t},$ $μ_{[2]} = [0, 1, 0, 0, - 1, 0]^{t},$ and $μ_{[3]} = [0, 0, 1, 0, 0, - 1]^{t}$ to be equal to each other as $2 (= m / \bar{m})$ -dimensional vectors.

We consider a reconstruction method based on Basic idea 3.2. First, we consider the case where $Ker \tilde{T} = {0}$ and $δ = 0,$ that is, the well-posed case. For $μ \in M$ and $μ_{[ℓ]} \in M_{[ℓ]}$ $(ℓ = 1, 2, \dots, \bar{m})$ given by (Equation9(9) $μ = \sum_{ℓ = 1}^{\bar{m}} μ_{[ℓ]} (μ_{[ℓ]} \in M_{[ℓ]}),$ (9) ), we define $μ_{[ℓ] ↓} := \sum_{k = ℓ}^{\bar{m}} μ_{[k]}, f_{[ℓ]} := T (μ_{[ℓ]}), f_{[ℓ] ↓} := \sum_{k = ℓ}^{\bar{m}} f_{[k]} .$ Then, $μ_{[1] ↓} = μ .$ Using the linearity of T, we have, for $ℓ = 1, \dots, \bar{m},$ $T (μ_{[ℓ] ↓}) = \sum_{k = ℓ}^{\bar{m}} T (μ_{[k]}) = \sum_{k = ℓ}^{\bar{m}} f_{[k]} = f_{[ℓ] ↓},$ and then, $f_{[1] ↓} = f$ and $T^{- 1} (f_{[ℓ] ↓}) = μ_{[ℓ] ↓} .$ Therefore, for $ℓ = 1, 2, \dots, \bar{m},$ there exists $T_{[ℓ]}^{- 1}$ such that it outputs $μ_{[ℓ]}$ to the input $f_{[ℓ] ↓},$ i.e. (11) $T_{[ℓ]}^{- 1} (f_{[ℓ] ↓}) = μ_{[ℓ]} = {μ_{[ℓ] ↓}|}_{M_{[ℓ]}} (ℓ = 1, 2, \dots, \bar{m}) .$ (11) Note that $T_{[ℓ]}^{- 1}$ does not give $μ_{[ℓ] ↓}$ but $μ_{[ℓ]}$ from $f_{[ℓ] ↓} .$ Thus we can use $T_{[ℓ]}^{- 1}$ $(ℓ = 1, 2, \dots, \bar{m})$ to obtain μ from a given f by the following procedure.

Procedure 3.4

Put $ℓ = 1.$
Calculate $μ_{[1]} = T_{[1]}^{- 1} (f) and f_{[1]} = T (μ_{[1]}) .$
$ℓ \leftarrow ℓ + 1.$
Calculate (12) $μ_{[ℓ]} = T_{[ℓ]}^{- 1} (f_{[ℓ] ↓}) = T_{[ℓ]}^{- 1} (f - \sum_{k = 1}^{ℓ - 1} f_{[k]}),$ (12) and if $ℓ < \bar{m},$ calculate $f_{[ℓ]} = T (μ_{[ℓ]}) .$
If $ℓ = \bar{m},$ stop the procedure. Otherwise, go back to step 3.

This procedure outputs $μ = \sum_{ℓ = 1}^{\bar{m}} μ_{[ℓ]} .$ In this procedure, the pieces are estimated and removed one-by-one from the top. We call this method a sequential estimation-removal method.

In general, $Ker \tilde{T}$ may not be equal to ${0},$ and there exists some noise $δ \neq 0$ in (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ). Therefore, we may not execute Procedure 3.4. In this case, we aim to obtain as much information regarding μ as possible. In the following, we assume that, for every $f = T_{δ} (μ) = T (μ) + δ,$ there exists $δ_{[ℓ] ↓} \in R^{n}$ such that $f_{[ℓ] ↓} = T (μ_{[ℓ] ↓}) + δ_{[ℓ] ↓}$ for $ℓ = 1, 2, \dots, \bar{m} .$ Set $\begin{aligned} M_{[ℓ] ↓} & := \{\sum_{k = ℓ}^{\bar{m}} m_{[k]} : m_{[k]} \in M_{[k]}\} ⟵ ⨁_{k = ℓ}^{\bar{m}} M_{[k]}, \\ where the symbol ⟵ means a natural projection, \\ F_{[ℓ] ↓} & := \{T (ν) : ν \in M_{[ℓ] ↓}\}, \\ Δ_{[ℓ] ↓} & := \{η \in R^{n} : possible noise for the elements of {F_{[ℓ] ↓}}\}, \\ D_{[ℓ] ↓} & := \{T (ν) + η : ν \in M_{[ℓ] ↓}, η \in Δ_{[ℓ] ↓}\} . \end{aligned}$ Then, $F_{[ℓ] ↓}$ and $D_{[ℓ] ↓}$ are the sets of components of the ℓth and lower pieces of all possible observation data values for the case where $δ = 0$ and $δ \neq 0,$ respectively. We have $M_{[1] ↓} = M,$ and set $Δ := Δ_{[1] ↓} .$ Here, considering that $M$ is a finite set, we assume the following.

Assumption 3.5

(For the case where $δ = 0$ )
There exist $\bar{\bar{m}} \in N,$ $\bar{\bar{m}} \leq \bar{m},$ and $T_{[ℓ]}^{- 1} : F_{[ℓ] ↓} \to M_{[ℓ] ↓}$ $(ℓ = 1, 2, \dots, \bar{\bar{m}})$ such that (13) $T_{[ℓ]}^{- 1} (f_{[ℓ] ↓}) = μ_{[ℓ]}$ (13) holds for every $f = T (μ)$ $(μ \in M) .$
(For the case where $δ \neq 0$ )
There exist $\bar{\bar{m}} \in N,$ $\bar{\bar{m}} \leq \bar{m},$ and $T_{[ℓ]}^{- 1} : D_{[ℓ] ↓} \to M_{[ℓ] ↓}$ $(ℓ = 1, 2, \dots, \bar{\bar{m}})$ such that (14) $T_{[ℓ]}^{- 1} (f_{[ℓ] ↓}) = μ_{[ℓ]}$ (14) holds for every $f = T_{δ} (μ)$ $(μ \in M, δ \in Δ) .$

In this assumption, $Ker \tilde{T}$ may or may not be ${0} .$ If $Ker \tilde{T} = {0},$ Assumption 3.5 (i) always holds for $\bar{\bar{m}} = \bar{m}$ (see (Equation11(11) $T_{[ℓ]}^{- 1} (f_{[ℓ] ↓}) = μ_{[ℓ]} = {μ_{[ℓ] ↓}|}_{M_{[ℓ]}} (ℓ = 1, 2, \dots, \bar{m}) .$ (11) )). $T_{[ℓ]}^{- 1}$ in (Equation14(14) $T_{[ℓ]}^{- 1} (f_{[ℓ] ↓}) = μ_{[ℓ]}$ (14) ) is an extension of $T_{[ℓ]}^{- 1}$ in (Equation13(13) $T_{[ℓ]}^{- 1} (f_{[ℓ] ↓}) = μ_{[ℓ]}$ (13) ) to the case where $δ \neq 0.$ Based on Assumption 3.5, Procedure 3.4 can be rewritten as follows.

Procedure 3.6

Suppose that Assumption 3.5 is satisfied.

Steps 1, 2 and 3 are the same steps in Procedure 3.4.
Steps 4 and 5 are steps 4 and 5 in Procedure 3.4, where $\bar{m}$ is replaced by $\bar{\bar{m}} .$

This procedure does not output μ. Instead, it outputs (15) $\sum_{ℓ = 1}^{\bar{\bar{m}}} μ_{[ℓ]},$ (15) that is, $μ_{[\bar{\bar{m}} + 1]}, \dots, μ_{[\bar{m}]}$ cannot be estimated. Even when we define maps that estimate them in some sense, the estimated results may contain errors.

In the experiments in Sections 5 and 6, we use Procedure 3.4 even if $\bar{\bar{m}} < \bar{m}$ to test how deep (large ℓ) a piece our method can estimate. Deep learning in Section 5 and the iterative method in Section 6 can be formally applied even if $\bar{\bar{m}} < \bar{m} .$

3.3. Selection of pieces of $M$

We consider piece monotonicity and weak piece monotonicity to be a naturally expected property of pieces to realize Basic idea 3.2. To execute Procedure 3.6, we must obtain pieces $M_{[1]},$ …, $M_{[\bar{\bar{m}}]}$ earlier and construct maps $T_{[1]}^{- 1},$ …, $T_{[\bar{\bar{m}}]}^{- 1}$ that satisfy Assumption 3.5. This assumption requires that the pieces satisfy some properties, which are somewhat different from (weak) piece monotonicity. This is considered here. Previous to it, we describe the stability/instability of the inverse problem (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ). We distinguish the elements of $M$ with a superscript index as $μ^{(α)} .$ Let $∥ \cdot ∥_{M}$ denote a norm on $R^{m},$ and (16) $\begin{aligned} r^{(α β)} & := \frac{{∥T (μ^{(α)}) - T (μ^{(β)})∥}_{O}}{{∥μ^{(α)} - μ^{(β)}∥}_{M}} (α \neq β), \\ r^{min} & := min \{r^{(α β)} : μ^{(α)}, μ^{(β)} \in M, α \neq β\} . \end{aligned}$ (16) If $r^{min} > 0,$ then the inverse problem (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ) with $δ = 0$ has a unique solution, and (17) ${∥μ^{(α)} - μ^{(β)}∥}_{M} \leq \frac{1}{r^{min}} {∥T (μ^{(α)}) - T (μ^{(β)})∥}_{O}$ (17) holds for every $μ^{(α)}$ and $μ^{(β)} .$ As the value of $r^{min}$ increases, the stability of the inverse problem (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ) increases. In fact, if $μ^{(α)} \neq μ^{(β)}$ holds and $r^{min}$ is sufficiently large, then the values of $T (μ^{(α)})$ and $T (μ^{(β)})$ are very different from each other.

For selecting pieces of $M,$ we consider a sufficient condition for Assumption 3.5 to hold (Lemma 3.8) and make the same consideration as above for each piece. Previously, we consider the following lemma that provides a sufficient condition for Assumption 3.5 (ii) to hold.

Lemma 3.7

Let $ℓ \in {1, 2, \dots, \bar{m}} .$ Assume that (18) $2 sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} < {∥T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)})∥}_{O}$ (18) is satisfied for ℓ and every $μ^{(α)}, μ^{(β)} \in M,$ $α \neq β .$ Then, Assumption 3.5 (ii) holds for ℓ.

Proof.

Let $g = T (μ_{[ℓ] ↓}) + δ_{[ℓ] ↓}$ $(μ_{[ℓ] ↓} \in M_{[ℓ] ↓}, δ_{[ℓ] ↓} \in Δ_{[ℓ] ↓}) .$ From (Equation18(18) $2 sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} < {∥T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)})∥}_{O}$ (18) ), we have that, for every $μ^{'} \in M$ with $μ_{[ℓ]}^{'} \neq μ_{[ℓ]},$ $\begin{aligned} {∥T (μ_{[ℓ] ↓}^{'}) - g∥}_{O} = {∥T (μ_{[ℓ] ↓}^{'}) - T (μ_{[ℓ] ↓}) - δ_{[ℓ] ↓}∥}_{O} \\ \geq {∥T (μ_{[ℓ] ↓}^{'}) - T (μ_{[ℓ] ↓})∥}_{O} - {∥δ_{[ℓ] ↓}∥}_{O} > 2 sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} - {∥δ_{[ℓ] ↓}∥}_{O} > {∥δ_{[ℓ] ↓}∥}_{O} . \end{aligned}$ Further, ${∥T (μ_{[ℓ] ↓}) - g∥}_{O} = {∥δ_{[ℓ] ↓}∥}_{O} .$ Thus we have $μ_{[ℓ] ↓} = \underset{ν \in M_{[ℓ] ↓}}{argmin} {∥T (ν) - g∥}_{O} .$ Therefore, we can define $T_{[ℓ]}^{- 1} : D_{[ℓ] ↓} \to M_{[ℓ] ↓}$ by $\begin{matrix} T_{[ℓ]}^{- 1} (g) := {(\underset{ν \in M_{[ℓ] ↓}}{argmin} {∥T (ν) - g∥}_{O})|}_{M_{[ℓ]}} \end{matrix}$

The value of the left-hand side of (Equation18(18) $2 sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} < {∥T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)})∥}_{O}$ (18) ) is determined by noise. We separate the noise-independent part from (Equation18(18) $2 sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} < {∥T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)})∥}_{O}$ (18) ), that is, we consider an inequality for T that is similar to (Equation17(17) ${∥μ^{(α)} - μ^{(β)}∥}_{M} \leq \frac{1}{r^{min}} {∥T (μ^{(α)}) - T (μ^{(β)})∥}_{O}$ (17) ) for each piece in the following lemma. This is inequality (Equation19(19) ${∥μ_{[ℓ]}^{(α)} - μ_{[ℓ]}^{(β)}∥}_{M} \leq C_{[ℓ]} {∥T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)})∥}_{O}$ (19) ). This inequality represents the difficulty of the estimation independent of noise.

Lemma 3.8

Let $ℓ \in {1, 2, \dots, \bar{m}} .$ For ℓ, consider the conditions that there exists $C_{[ℓ]} > 0$ such that

it holds that (19) ${∥μ_{[ℓ]}^{(α)} - μ_{[ℓ]}^{(β)}∥}_{M} \leq C_{[ℓ]} {∥T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)})∥}_{O}$ (19) for every $μ^{(α)}, μ^{(β)} \in M,$
it holds that (20) $2 C_{[ℓ]} sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} < {∥μ_{[ℓ]}^{(α)} - μ_{[ℓ]}^{(β)}∥}_{M}$ (20) for every $μ^{(α)}, μ^{(β)} \in M,$ $α \neq β .$

Put $L := {1, 2, \dots, \bar{\bar{m}}} .$ If and only if every $ℓ \in L$ satisfies Lemma 3.8 (i), then Assumption 3.5 (i) holds. If every $ℓ \in L$ satisfies Lemma 3.8 (i) and (ii), then Assumption 3.5 (ii) holds.

Proof.

The first case can be shown via a straightforward calculation. We show the second case. By employing (Equation19(19) ${∥μ_{[ℓ]}^{(α)} - μ_{[ℓ]}^{(β)}∥}_{M} \leq C_{[ℓ]} {∥T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)})∥}_{O}$ (19) ) and (Equation20(20) $2 C_{[ℓ]} sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} < {∥μ_{[ℓ]}^{(α)} - μ_{[ℓ]}^{(β)}∥}_{M}$ (20) ), we have $2 C_{[ℓ]} sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} < {∥μ_{[ℓ]}^{(α)} - μ_{[ℓ]}^{(β)}∥}_{M} \leq C_{[ℓ]} {∥T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)})∥}_{O} .$ Therefore, (Equation18(18) $2 sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} < {∥T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)})∥}_{O}$ (18) ) holds.

In Lemma 3.8, as the value of $C_{[ℓ]}$ decreases, it becomes easier to distinguish between $μ_{[ℓ]}^{(α)}$ and $μ_{[ℓ]}^{(β)} .$ To apply Lemma 3.8 to the inverse problem (Equation1(1) $estimate μ \in M from observation data f = T_{δ} (μ) := T (μ) + δ,$ (1) ), it is desirable for us to know the value of $min C_{[ℓ]} .$ Define (21) $r_{[ℓ] ↓}^{(α β)} := \frac{{∥T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)})∥}_{O}}{{∥μ_{[ℓ]}^{(α)} - μ_{[ℓ]}^{(β)}∥}_{M}} (α \neq β)$ (21) and (22) $r_{[ℓ]} := min \{r_{[ℓ] ↓}^{(α β)} : μ_{[ℓ]}^{(α)}, μ_{[ℓ]}^{(β)} \in M_{[ℓ]}, α \neq β\} .$ (22) Then, if $r_{[ℓ]} \neq 0,$ $min C_{[ℓ]} = 1 / r_{[ℓ]}$ holds and inequality (Equation20(20) $2 C_{[ℓ]} sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} < {∥μ_{[ℓ]}^{(α)} - μ_{[ℓ]}^{(β)}∥}_{M}$ (20) ) can be rewritten as (23) $\frac{2}{r_{[ℓ]}} sup_{η \in Δ_{[ℓ] ↓}} {∥η∥}_{O} < {∥μ_{[ℓ]}^{(α)} - μ_{[ℓ]}^{(β)}∥}_{M} .$ (23) As the value of $∥ T (μ_{[ℓ] ↓}^{(α)}) - T (μ_{[ℓ] ↓}^{(β)}) ∥_{O}$ increases, it becomes easier to distinguish between $μ_{[ℓ]}^{(α)}$ and $μ_{[ℓ]}^{(β)} .$ In general, if the values of $∥ T (μ_{[ℓ]}) ∥_{O}$ $(μ_{[ℓ]} \in M_{[ℓ]})$ are large, it may hold that the values of $∥ T (μ_{[ℓ]}^{(α)}) - T (μ_{[ℓ]}^{(β)}) ∥_{O}$ are also large. Therefore, we expect piece monotonicity (otherwise weak piece monotonicity) holds.

Thus, the preparation procedure prior to Procedures 3.4 and 3.6 is given below. In the case where $δ \neq 0,$ it is particularly useful if the range of values of $∥ μ_{[ℓ]}^{(α)} - μ_{[ℓ]}^{(β)} ∥_{M}$ $(α \neq β)$ is approximately the same for $ℓ = 1, 2, \dots, \bar{\bar{m}} .$ (Our discretized magnetization-distribution identification problem satisfies this property of this range.)

Procedure 3.9

1.	1. Select $M_{[1]}, M_{[2]},$ $\dots, M_{[\bar{m}]}$ satisfying (24) $r_{[1]} \geq r_{[2]} \geq \dots \geq r_{[\bar{m}]} (if possible, r_{[1]} > r_{[2]} > \dots > r_{[\bar{m}]})$ (24) and Assumption 3.5 (i) (when $δ = 0$ ) or (ii) (when $δ \neq 0$ ) for $ℓ = 1, 2, \dots, \bar{\bar{m}} .$ In addition, it is desirable that piece monotonicity (Equation10(10) ${∥T (μ_{[ℓ + 1]})∥}_{O} < {∥T (μ_{[ℓ]})∥}_{O}$ (10) ), otherwise weak piece monotonicity, be satisfied for as many $ℓ = 1, \dots, \bar{m} - 1$ as possible.
2.	2. Construct a map $N_{[ℓ]} \approx T_{[ℓ]}^{- 1}$ that approximately satisfies Assumption 3.5 from the top piece $(ℓ = 1)$ to the lowest possible piece. For example, $N_{[ℓ]}$ is a neural network whose activation function is the rectified linear function, and the loss function for training is given by the mean squared error: (25) ${∥N_{[ℓ]} (T (μ_{[ℓ] ↓})) - μ_{[ℓ]}∥}_{M}^{2} .$ (25)

In step 2 of Procedure 3.9, such a neural network $N_{[ℓ]}$ is a piecewise linear function and $N_{[ℓ]} (g)$ is well defined for every $g \in R^{n} .$ It is expected that $N_{[ℓ]}$ maps every g close to $T (μ_{[ℓ] ↓})$ to $μ_{[ℓ]},$ that is, Assumption 3.5 holds approximately.

Since Assumption 3.5 provides the existence of $T_{[ℓ]}^{- 1}$ only for $ℓ = 1, \dots, \bar{\bar{m}},$ Procedure 3.6 may not accurately estimate $μ_{[\bar{\bar{m}} + 1]}, \dots, μ_{[\bar{m}]}$ in general. If we attempt to estimate them, errors may occur. Moreover, the errors of the estimates made at the upper piece are transmitted to the lower piece.

3.4. Complexity/number of linear regions

Assume that pieces $M_{[1]}, M_{[2]}, \dots, M_{[\bar{m}]}$ satisfy piece monotonicity or weak piece monotonicity. Then, $μ_{[ℓ + 1]}$ may be more difficult to estimate than $μ_{[ℓ]},$ that is, for $μ, μ^{'} \in M,$ the distinction between $μ_{[ℓ + 1]}$ and $μ_{[ℓ + 1]}^{'}$ may be more difficult to identify than the distinction between $μ_{[ℓ]}$ and $μ_{[ℓ]}^{'} .$ Now, we sketch one of the reasons.

As an example, we consider the following case: $μ^{(a)}, μ^{(b)} \in M_{[1]},$ $μ^{(c)}, μ^{(d)} \in M_{[2]},$ $\begin{aligned} {∥T (μ^{(c)})∥}_{O} ≪ {∥T (μ^{(a)})∥}_{O}, & {∥T (μ^{(c)})∥}_{O} ≪ {∥T (μ^{(b)})∥}_{O}, \\ {∥T (μ^{(d)})∥}_{O} ≪ {∥T (μ^{(a)})∥}_{O}, & {∥T (μ^{(d)})∥}_{O} ≪ {∥T (μ^{(b)})∥}_{O}, \end{aligned}$ and ${∥T (μ^{(c)}) - T (μ^{(d)})∥}_{O} ≪ {∥T (μ^{(a)}) - T (μ^{(b)})∥}_{O} .$

Figure illustrates this case. Set $μ^{(1)} := μ^{(a)} + μ^{(c)},$ $μ^{(2)} := μ^{(a)} + μ^{(d)},$ $μ^{(3)} := μ^{(b)} + μ^{(c)}$ and $μ^{(4)} := μ^{(b)} + μ^{(d)} .$ We consider identifying $μ^{(a)}, \dots, μ^{(d)}$ from (noiseless) observation data $T (μ^{(1)}), \dots, T (μ^{(4)}) .$ If $N_{[ℓ]}$ is a neural network whose activation function is the rectified linear function, the identification performance of $N_{[ℓ]}$ depends on the number and shape of linear regions in the input space (see [Citation13]). Therefore, we consider using straight lines to identify $μ^{(1)}, \dots, μ^{(4)},$ or $μ^{(a)}, \dots, μ^{(d)} .$

In the input space of $N^{(1)},$ we can distinguish between ${T (μ^{(1)}), T (μ^{(2)})}$ and ${T (μ^{(3)}), T (μ^{(4)})}$ by one straight line (e.g. $L_{1}$ ). Therefore, we can distinguish between $μ^{(a)}$ and $μ^{(b)}$ by one straight line.
Next, we can distinguish between $μ^{(c)}$ and $μ^{(d)}$ by one straight line using our sequential estimation-removal method because we can obtain $T (μ^{(c)})$ and $T (μ^{(d)})$ by $T (μ^{(c)}) = T (μ^{(1)}) - T (μ^{(a)}) = T (μ^{(3)}) - T (μ^{(b)})$ and $T (μ^{(d)}) = T (μ^{(2)}) - T (μ^{(a)}) = T (μ^{(4)}) - T (μ^{(b)}),$ respectively.

Figure 5. Complexity

Thus, using our sequential estimation-removal method, we can identify $μ^{(a)}, \dots, μ^{(d)}$ (or $μ^{(1)}, \dots, μ^{(4)})$ by one straight line twice. However, three straight lines are required to distinguish between ${T (μ^{(1)}), T (μ^{(3)})}$ and ${T (μ^{(2)}),$ $T (μ^{(4)})}$ (e.g. $L_{1},$ $L_{2},$ and $L_{3}$ ). Therefore, only one straight line is required to distinguish between $μ^{(a)}$ and $μ^{(b)},$ but three straight lines are required to distinguish between $μ^{(c)}$ and $μ^{(d)}$ directly from the observation values. These mean that $T (μ^{(c)})$ and $T (μ^{(d)})$ are ‘buried’ in $T (μ^{(a)})$ and $T (μ^{(b)}) .$

4. Some numerical analysis of our discretized magnetic inverse problem

4.1. ‘Practical’ rank of matrix G

We consider the rank of matrix G. As described below, a ‘practical’ rank of G is significantly low. Since the fractional value of (Equation4(4) $g_{k j} := G (o_{k} - p_{j}) .$ (4) ) for given $(o_{k}, p_{j})$ is usually not divisible and all components of G will be different from one another, the ‘exact’ rank of G probably will be full. However, in practice, two almost parallel row (or column) vectors should be considered to be exactly parallel. Thus we examine how parallel the row vectors of G are by applying the following procedure. (The modified Gram-Schmidt algorithm was used in implementing this procedure.)

Procedure 4.1

Let $g_{k}$ $(k = 1, 2, \dots, n_{o})$ denote the k-th normalized row vector of G. (If ${\bar{g}}_{k}$ is the kth row vector of G, then $g_{k} = {\bar{g}}_{k} / ∥ {\bar{g}}_{k} ∥,$ where $∥ \cdot ∥$ denotes the Euclidean norm in $R^{n_{ρ}} .$ )

1.	1. Let $ϵ > 0$ be a sufficiently small real number. Set $n = 1.$ Let $L_{1}$ be the linear space spanned by $g_{1}$ and $u_{1} := g_{1} .$
2.	2. Using the basis ${u_{1}, \dots, u_{n}}$ of $L_{n},$ calculate $u_{r} := g_{r} - \sum_{k = 1}^{n} (g_{r} {∙} v_{k}) v_{k},$ for $r = n + 1, \dots, n_{o},$ where $$ ∙ $$ denotes the inner product.
3.	3. Calculate $h := \underset{r = n + 1, \dots, n_{o}}{a r g m a x} ∥ u_{r} ∥ .$ Then, swap $u_{n + 1}$ and $u_{h},$ and swap $g_{n + 1}$ and $g_{h} .$
4.	4. If $∥ u_{n + 1} ∥ > ϵ,$ let $L_{n + 1}$ be the linear space spanned by ${u_{1}, \dots, u_{n}, u_{n + 1}} .$ If $∥ u_{n + 1} ∥ \leq ϵ,$ stop the procedure and output ${u_{1}, \dots, u_{n}} .$
5.	If $n + 1 = n_{o},$ stop the procedure and output ${u_{1}, \dots, u_{n_{o}}} .$ If $n + 1 < n_{o},$ set $n \leftarrow n + 1$ and go back to step 2.

Remark 4.2

The distribution of ${∥ u_{n} ∥}$ given by Procedure 4.1 depends on the order of $g_{k}$ specified by k. Step 2 determines this order except the selection of $g_{1} .$

In our numerical experiments, we set $ϵ = 0.0001$ and recorded the values of ${∥ u_{n} ∥}$ ( ${u_{n}}$ is the output). These values represent a ‘practical’ rank of G. In fact, if the value of $∥ u_{n} ∥$ is below a certain threshold (for example 0.01), we consider that $g_{n}$ belongs to the linear space spanned by $g_{1}, \dots, g_{n - 1}$ in our numerical calculations. Some results of experiments using this procedure are presented in Tables .

Estimation method for inverse problems with linear forward operator and its application to magnetization estimation from magnetic force microscopy images using deep learning

ABSTRACT

1. Introduction

2. Magnetization-distribution identification problem and its discretization

2.1. Problem

2.2. Simple discretization

2.3. Selection of pieces of M

3. Inverse problem with a linear forward operator and a sequential estimation-removal method

3.1. Inverse problem with a linear forward operator

3.2. A sequential estimation-removal method

3.3. Selection of pieces of M

3.4. Complexity/number of linear regions

4. Some numerical analysis of our discretized magnetic inverse problem

4.1. ‘Practical’ rank of matrix G

Table 1. ‘Practical’ rank of G (observation points: 20×20×1).

Table 2. ‘Practical’ rank of G (Observation points: 20×20×4).

Table 3. ‘Practical’ rank of G (observation points: 40×40×1).

Table 4. ‘Practical’ rank of G (Observation points: 40×40×2).

4.2. Stability/instability

4.3. Properties of the pieces of M defined in Section 2.3

Table 5. Minimum and average of {1000⋅r[ℓ]↓(αβ)} (nx=ny=20).

Table 6. Minimum and average of {1000⋅r[ℓ]↓(αβ)} (nx=ny=32).

Table 7. Minimum and maximum of {1000⋅∥GRμ[ℓ]∥O} (nx=ny=20).

Table 8. Minimum and maximum of {1000⋅∥GRμ[ℓ]∥O} (nx=ny=32).

Table 9. Minimum and average of {∥GRμ[ℓ]∥O/∥GRμ[ℓ+1]∥O:μ[ℓ]=μ[ℓ+1]} (nx=ny=20).

Table 10. Minimum and average of {∥GRμ[ℓ]∥O/∥GRμ[ℓ+1]∥O:μ[ℓ]=μ[ℓ+1]} (nx=ny=32).

4.4. Conclusion of this section

5. Deep learning and experimental results

Table 11. Execution environment.

5.1. Structure of neural networks

5.2. Deep learning

5.3. Experimental results of the noiseless cases

Table 12. Reversal rates.

Table 13. Estimation performances of each rate.

Table 14. Estimation performances of each d(O,M).

Table 15. Estimation performance for the size of the problem and the number of filters.

Table 16. Estimation performance when no error occurs in the upper pieces.

5.4. Experimental results on cases with noise

Table 17. Estimation performance for cases with noise.

6. Iterative method and experimental results

6.1. A simple version of Murray–Ng's global smoothing algorithm

6.2. Experimental results

Table 18. Estimation performance of the iteration method (nx=ny=20).

Table 19. Estimation performance of the iteration method (nx=ny=32).

7. Results and discussion

Acknowledgments

Data availability

Disclosure statement

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

2.3. Selection of pieces of $M$

3.3. Selection of pieces of $M$

Table 1. ‘Practical’ rank of G (observation points: $20 \times 20 \times 1$ ).

Table 2. ‘Practical’ rank of G (Observation points: $20 \times 20 \times 4$ ).

Table 3. ‘Practical’ rank of G (observation points: $40 \times 40 \times 1$ ).

Table 4. ‘Practical’ rank of G (Observation points: $40 \times 40 \times 2$ ).

4.3. Properties of the pieces of $M$ defined in Section 2.3

Table 5. Minimum and average of ${1000 \cdot r_{[ℓ] ↓}^{(α β)}}$ $(n_{x} = n_{y} = 20)$ .

Table 6. Minimum and average of ${1000 \cdot r_{[ℓ] ↓}^{(α β)}}$ $(n_{x} = n_{y} = 32)$ .

Table 7. Minimum and maximum of ${1000 \cdot ∥ G R μ_{[ℓ]} ∥_{O}}$ $(n_{x} = n_{y} = 20) .$

Table 8. Minimum and maximum of ${1000 \cdot ∥ G R μ_{[ℓ]} ∥_{O}}$ $(n_{x} = n_{y} = 32) .$

Table 9. Minimum and average of ${∥ G R μ_{[ℓ]} ∥_{O} / ∥ G R μ_{[ℓ + 1]} ∥_{O} : μ_{[ℓ]} = μ_{[ℓ + 1]}}$ $(n_{x} = n_{y} = 20)$ .

Table 10. Minimum and average of ${∥ G R μ_{[ℓ]} ∥_{O} / ∥ G R μ_{[ℓ + 1]} ∥_{O} : μ_{[ℓ]} = μ_{[ℓ + 1]}}$ $(n_{x} = n_{y} = 32)$ .

Table 14. Estimation performances of each $d (O, M)$ .

Table 18. Estimation performance of the iteration method ( $n_{x} = n_{y} = 20$ ).

Table 19. Estimation performance of the iteration method ( $n_{x} = n_{y} = 32$ ).