Search in:

Inverse Problems in Science and Engineering Volume 29, 2021 - Issue 12

Submit an article Journal homepage

Free access

831

Views

CrossRef citations to date

Altmetric

Listen

Research Article

A blocking scheme for dimension-robust Gibbs sampling in large-scale image deblurring

Jesse Adamsa Nevada National Security Site, San Diego, CA, USACorrespondence[email protected]
View further author information

Matthias Morzfeldb Institute of Geophysics and Planetary Physics, Scripps Institution of Oceanography, University of California, San Diego, CA, USAView further author information

Kevin Joycec Lawrence Livermore National Laboratory, San Diego, CA, USAView further author information

Marylesa Howarda Nevada National Security Site, San Diego, CA, USAView further author information

Aaron Luttmand Pacific Northwest National Laboratory, San Diego, CA, USAView further author information

Pages 1789-1810 | Received 09 Sep 2020, Accepted 13 Jan 2021, Published online: 05 Feb 2021

Cite this article
https://doi.org/10.1080/17415977.2021.1880398
CrossMark

In this article

ABSTRACT
1. Introduction
2. Bayesian convolution model with data-driven boundary conditions
3. Gibbs sampling for large-scale image deblurring
4. Applications to X-ray radiography
5. Conclusions
Acknowledgements
Disclosure statement
Additional information
Footnotes
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Among the most significant challenges with using Markov chain Monte Carlo (MCMC) methods for sampling from the posterior distributions of Bayesian inverse problems is the rate at which the sampling becomes computationally intractable, as a function of the number of estimated parameters. In image deblurring, there are many MCMC algorithms in the literature, but few attempt reconstructions for images larger than $512 \times 512$ pixels ( $10^{5}$ parameters). In quantitative X-ray radiography, used to diagnose dynamic materials experiments, the images can be much larger, leading to problems with millions of parameters. We address this issue and construct a Gibbs sampler via a blocking scheme that leads to a sparse and highly structured posterior precision matrix. The Gibbs sampler naturally exploits the special matrix structure during sampling, making it ‘dimension-robust’, so that its mixing properties are nearly independent of the image size, and generating one sample is computationally feasible. The dimension-robustness enables the characterization of posteriors for large-scale image deblurring problems on modest computational platforms. We demonstrate applicability of this approach by deblurring radiographs of size $4096 \times 4096$ pixels ( $10^{7}$ parameters) taken at the Cygnus Dual Beam X-ray Radiography Facility at the U.S. Department of Energy's Nevada National Security Site.

Keywords:

Deconvolution
Markov chain Monte Carlo
Gibbs sampling

AMS classifications:

15A29
65F22
65C05
65C60
94A08

1. Introduction

As opposed to most digital photography – where the goal is to produce pictures that look as good as possible – the primary goal of quantitative imaging is to compute features in images for use in quantitative scientific studies. For example, in digital X-ray radiography, it is common to pulse X-rays to penetrate a scene, to convert the non-attenuated X-rays into visible light for capture on a CCD camera, and to compute features such as object locations or material densities from the resulting images [Citation1–4]. Image blur, caused in particular by the finite extent of the X-ray source (it's not a point source), optical aberrations introduced by the lens system, and conversion of X-ray photons to light that is digitally recorded by conversion onto a CCD, makes computing scene features challenging [Citation5–7]. There is an extensive literature on using convolution as a forward model for image blur (see [Citation8] and the references therein), and a great deal of work has been done to formulate image deblurring within Bayesian frameworks and to develop Markov chain Monte Carlo (MCMC) methods for numerically describing the corresponding posteriors by a set of samples, so that expected values with respect to the posterior can be approximated by averages over the samples [Citation9–16].

Most of the MCMC methods in image deblurring are based on Metropolis-Hastings samplers, or Gibbs samplers, or a combination of the two. The large-scale nature of many image deblurring problems, caused by the large number of pixels, makes MCMC computationally difficult, but several methods have been developed to keep the computational requirements of MCMC manageable, in particular for linear inverse problems and image deblurring. Fast and scalable methods for sampling high-dimensional Gaussians can be constructed by using analogies of Gibbs samplers with linear solvers. Examples include samplers that use matrix splitting to accelerate convergence [Citation12, Citation13, Citation17] or that use (preconditioned) Krylov subspace methods [Citation18–20]. We add to this line of work and devise a blocking scheme for a Gibbs sampler that makes it a practical tool for large-scale image deblurring.

1.1. Blocking schemes in Gibbs sampling

A fundamental property of Gibbs sampling, which is important in high-dimensional sampling problems, can be illustrated by the following example of a distribution $p (x)$ over a three-component parameter vector $x = [x_{1}, x_{2}, x_{3}]$ . Specifically, the Gibbs sampler can use any of the following iterations to produce samples from $p (x)$ : $\begin{array}{ccc} Iteration (a) & Iteration (b) & Iteration (c) \\ x_{1}^{k}, x_{2}^{k}, x_{3}^{k} \sim p (x_{1}, x_{2}, x_{3}) & x_{1}^{k}, x_{2}^{k} \sim p (x_{1}, x_{2} | x_{3}^{k - 1}) & x_{1}^{k} \sim p (x_{1} | x_{2}^{k - 1}, x_{3}^{k - 1}) \\ x_{3}^{k} \sim p (x_{3} | x_{1}^{k}, x_{2}^{k}) & x_{2}^{k} \sim p (x_{2} | x_{1}^{k}, x_{3}^{k - 1}) \\ x_{3}^{k} \sim p (x_{3} | x_{1}^{k}, x_{2}^{k}), \end{array}$ where k is the Markov chain iterate, and $x_{1}^{0}$ , $x_{2}^{0}$ , $x_{3}^{0}$ are assumed given. That is to say that we can group $x_{1}$ , $x_{2}$ , $x_{3}$ together and sample directly from the target distribution (independence sampler, Iteration (a)), sample each of $x_{1}$ , $x_{2}$ , and $x_{3}$ separately from their respective conditional distributions in any order (represented by Iteration (c)), or sample any combination of two and one or one and two parameters (represented by Iteration (b)). The way the parameters are grouped together within a Gibbs sampler is called its blocking scheme, and, while different blocking schemes for a Gibbs sampler will always result in a Markov chain whose stationary distribution is the correct target (see [Citation21, Citation22]), all blocking schemes are not created equal with respect to the computational efficiency of each step in the Gibbs iteration nor with respect to the convergence properties of the Markov chain [Citation23, Citation24].

1.2. Motivation and main result

The goal of the work presented here is to create an MCMC sampler that can compute samples from the Gaussian posterior of a Bayesian-formulated image deblurring problem that arises in quantitative X-ray radiography. The main obstacle is the large dimension of the problem. For a radiograph with $4096 \times 4096$ pixels, the dimension of the problem is $4096^{2} \approx 1.6 \cdot 10^{7}$ , since each pixel is an unknown. For a high-dimensional problem, one should use a ‘dimension-robust’ algorithm, that satisfies that

the mixing properties of the Markov chain do not degrade with dimension;
the computational requirements for generating one sample must be feasible

(see [Citation25] for the original source of the term ‘dimension-robustness’). Achieving dimension-robustness is not straightforward. Typical MCMC samplers, such as random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA), or Hamiltonian MCMC (HMC), are not dimension robust, because their (optimal) step size decreases with dimension [Citation26–28]. Gibbs iterations of type (c) are not dimension-robust because its mixing properties degrade with dimension (unless the problem is a collection of independent random variables). Gibbs iterations of type (a) are not dimension-robust because the computational requirements for generating one sample are large, even when the posterior is Gaussian [Citation13].

We construct a Gibbs sampler of type (b), with a blocking scheme that groups parameters together strategically to achieve dimension-robustness. The blocking scheme is chosen to exploit local properties of 2D convolution (representing blur) and to lead to a sparse and highly structured posterior precision matrix. Essentially, the blocking scheme groups together only those pixels that are ‘close’ to each other, where closeness depends on the size of the convolution kernel and length scales in the prior (which are often small compared to the size of the kernel). Because the blocking scheme results in blocks of strongly correlated parameters, the Markov chain generated by the Gibbs sampler has good mixing properties, even when the number of pixels is large. Such good mixing of the chain in high dimensions is one of the key concepts for dimension-robustness. A second key idea is finding the balance between working with a small number of large blocks vs. a large number of small blocks, which is a competition between a small number of expensive Gibbs iterates vs. a large number of inexpensive iterates.

Taken together, our ideas result in a Gibbs sampler that is capable of drawing samples from Gaussian posterior distributions with more than $10^{7}$ parameters using relatively modest computational platforms. The robustness of the ideas presented here is demonstrated by deblurring radiographs taken at the Cygnus Dual Beam X-ray Radiography Facility at the U.S. Department of Energy's Nevada National Security Site, a system whose images are significantly larger than those that typically appear in the literature [Citation12, Citation29–34].

2. Bayesian convolution model with data-driven boundary conditions

A standard one-dimensional model for discrete convolution of a vector $x \in R^{n_{x}}$ with kernel $a \in R^{n_{a}}$ is a vector $b \in R^{n_{b}}$ given by (1) $[\begin{matrix} b_{1} \\ b_{2} \\ ⋮ \\ b_{n_{b}} \end{matrix}] = [\begin{array}{ccccccccc} a_{n_{a}} & \dots & \dots & \dots & a_{1} \\ a_{n_{a}} & \dots & \dots & \dots & a_{1} \\ ⋱ & ⋱ & ⋱ & ⋱ & ⋱ \\ a_{n_{a}} & \dots & \dots & \dots & a_{1} \\ a_{n_{a}} & \dots & \dots & \dots & a_{1} \end{array}] [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ ⋮ \\ x_{n_{x}} \end{matrix}],$ (1) and we denote the convolution matrix by $A \in R^{n_{b} \times n_{x}}$ , where $n_{x} = n_{b} + n_{a} - 1$ .

In imaging applications, $b$ is the measured, blurry image, $a$ is the blur kernel (the so-called point spread function), and $x$ is the ideal, or deblurred image. Despite (Equation1(1) $[\begin{matrix} b_{1} \\ b_{2} \\ ⋮ \\ b_{n_{b}} \end{matrix}] = [\begin{array}{ccccccccc} a_{n_{a}} & \dots & \dots & \dots & a_{1} \\ a_{n_{a}} & \dots & \dots & \dots & a_{1} \\ ⋱ & ⋱ & ⋱ & ⋱ & ⋱ \\ a_{n_{a}} & \dots & \dots & \dots & a_{1} \\ a_{n_{a}} & \dots & \dots & \dots & a_{1} \end{array}] [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ ⋮ \\ x_{n_{x}} \end{matrix}],$ (1) ) being a common model for image blur, it is actually somewhat unnatural in imaging applications, because it implies that the domain of the blury image $b \in R^{n_{b}}$ , also called the ‘field of view ', is not the same as the domain of the ideal image $x \in R^{n_{x}}$ . To fix that discrepancy, boundary conditions, like periodic, zero, or reflective, are often imposed to provide formulas for the parameters outside of the field of view, based on those within the field of view. This approach effectively shrinks the unknown $x$ to have the same dimension as the blurry image ( $n_{x} = n_{b}$ ). Thus, $A$ is a square convolution matrix, and the linear system that defines $x$ given $b$ and $A$ can be solved using computationally efficient methods. Standard boundary conditions, however, are also unnatural impositions in imaging applications, a picture taken of an experiment will never exist on a torus (as periodic boundaries imply) or be perfectly dark outside of the field of view (as zero boundaries imply).

Rather than shrink the problem to a smaller dimension, an alternate approach – introduced in [Citation35, Citation36] and fully detailed in [Citation10] – is to instead solve the larger problem of reconstructing the full vector $x$ of size $n_{x} = n_{b} + n_{a} - 1$ . Assuming periodic boundary conditions on $x$ and extending the point spread function by padding on the front and back with $(n_{b} - 1) / 2$ Footnote¹ zeros gives $\hat{a} = [0, \dots, 0, a_{n_{a}}, \dots, a_{1}, 0, \dots, 0],$ resulting in $\hat{a} \in R^{n_{x}}$ with a corresponding convolution matrix $\hat{A} \in R^{n_{x} \times n_{x}}$ , taking into account periodic boundary conditions on this extended domain. Note that assuming periodic boundary conditions on this extended domain is no longer unnatural, since boundary artefacts can only impact the values of $x$ outside of the field of view. Since the size of the data (the blurry image) is still $n_{b}$ , we then define the matrix $D \in R^{n_{b} \times n_{x}}$ to have the middle $n_{b}$ rows of the identity of size $n_{x} \times n_{x}$ . This matrix has the effect of picking out the centre $n_{b}$ elements of a vector of length $n_{x}$ so that $b = D \hat{A} x$ . Throughout the remainder of this work, these ‘data-driven’ boundary conditions are applied. It is straightforward to extend these ideas to two dimensions (actual images). In 2D, we denote the number of rows of pixels of an image by m and the number of columns of pixel by n (often with sub-indices, see Table for details). The blurring kernel, $a$ , is also a two-dimensional object with $m_{a}$ rows and $n_{a}$ columns. This leads to a convolution matrix, $A$ , with $m_{x}$ rows and $n_{x}$ columns, where $m_{x}$ and $n_{x}$ are the number of rows and columns of the reconstructed image.

A Bayesian formulation of the deblurring problem requires that we model errors by random variables. Here, we use a simple (but popular) additive Gaussian error model and write (2) $b = A x + ε,$ (2) where $ε \sim N (0, λ^{- 1} I)$ with precision parameter $λ > 0$ and, throughout the paper, $N (y, Σ)$ denotes a Gaussian with mean $y$ and covariance matrix $Σ$ . We further simplify the notation and write $A$ for $D \hat{A}$ , because we exclusively work with data-driven boundary conditions. Equation (Equation2(2) $b = A x + ε,$ (2) ) defines the likelihood (3) $p (b | x, λ) \propto \exp (- \frac{λ}{2} {‖ A x - b ‖}^{2}),$ (3) where $‖ \cdot ‖$ is the Euclidean norm, i.e. $‖ y ‖ = \sqrt{y_{1}^{2} + \dots + y_{n_{y}}^{2}}$ . To complete the Bayesian problem formulation, we define a Gaussian Markov random field prior [Citation37, Citation38] by (4) $p (x | δ) \propto \exp (- \frac{δ}{2} {‖ L^{1 / 2} x ‖}^{2}),$ (4) where $L$ is the discrete Laplacian operator, $L^{1 / 2}$ is a matrix square root, and $δ > 0$ is a scalar. While we focus on using the Laplacian, other choices are also possible and easy to incorporate into our overall approach.

The posterior distribution is proportional to the product of the prior and the likelihood, (5) $p (x | b, λ, δ) \propto \exp (- \frac{λ}{2} {‖ A x - b ‖}^{2} - \frac{δ}{2} {‖ L^{1 / 2} x ‖}^{2}) .$ (5) The goal is thus to draw samples from the Gaussian posterior distribution (Equation5(5) $p (x | b, λ, δ) \propto \exp (- \frac{λ}{2} {‖ A x - b ‖}^{2} - \frac{δ}{2} {‖ L^{1 / 2} x ‖}^{2}) .$ (5) ), $x | b, λ, δ \sim N (m, H^{- 1})$ , with the posterior precision and mean (6) $\begin{aligned} H = λ A^{⊤} A + δ L, \end{aligned}$ (6) (7) $\begin{aligned} m = λ H^{- 1} A^{⊤} b . \end{aligned}$ (7) Note that we assume the precision parameters λ and δ are given. It is also possible to use the Bayesian framework to estimate these parameters on the fly, via hierarchical models as in [Citation37], but that is not done here.

2.1. Notation

We denote vectors by lower case bold face and matrices by upper case bold face and use n and m to denote dimensions. We stack the columns of pixels for an image as follows: for an image $Y$ of size $m_{y} \times n_{y}$ , the corresponding column stack vector is $y \in R^{m_{y} n_{y} \times 1}$ . The proposed Gibbs sampler is notationally intensive, so a list of frequently used matrices, vectors, and sizes is included in Appendix (Table ).

3. Gibbs sampling for large-scale image deblurring

Applying a Gibbs sampler in large-scale image deblurring at a reasonable computational cost requires that

a single sample can be generated in a short time, even if the image size is large;
the mixing properties of the Markov chain, generated by the sampler, do not degrade catastrophically with the size of the image.

In this section, we describe how to construct and implement a sub-image based blocking scheme that leads to a Gibbs sampler that satisfies both of the above requirements. We start with a brief review of the basics of Gibbs sampling in Section 3.1, then describe the sub-image based blocking scheme in Section 3.2. This blocking scheme is constructed to exploit the local nature of convolution and regularization to result in well-structured and sparse convolution, prior precision, and posterior precision matrices. Exploiting this special matrix structure within the 2D environment characteristic of imaging problems is key for the applicability of the sampler in large-scale problems.

Section 3.3 details how the size of the sub-images needs to be chosen to balance the effort required to compute a single sample in the Markov chain with the mixing properties of the Markov chain, i.e. how many samples in the chain are required to characterize the posterior. Section 3.4 discusses why it is expected that Gibbs samplers, with the sub-image based blocking scheme, are ‘dimension-robust ', which is to say that their mixing properties depend only mildly on the size of the image. Section 3.5 provides details of the implementation of the sampler. The images in our target application in X-ray radiography have $16 \cdot 10^{6}$ pixels, which results in posterior precision matrices of size $(16 \cdot 10^{6}) \times (16 \cdot 10^{6})$ . Assembly of such large matrices is prohibitively expensive, even if we exploit their sparsity, so a matrix-free implementation of the sampler is described.

3.1. Review of Gibbs sampling

The basic idea of a Gibbs sampler in the context of drawing samples from an $n_{x}$ -dimensional Gaussian (posterior) distribution $x \sim N (m, H^{- 1})$ is to partition $x$ into $n_{p}$ blocks, (8) $x = [x_{1}, \dots, x_{n_{p}}],$ (8) where each $x_{i}$ is an $n_{d} = n_{x} / n_{p}$ dimensional vector, and to subsequently use the conditional distributions (9) $p (x_{i} | x_{1}, \dots, x_{i - 1}, x_{i + 1}, \dots x_{n_{p}}),$ (9) for sampling [Citation21, Citation38–40]. For a Gaussian posterior, the conditional distributions are also Gaussians, (10) $x_{i} | x_{1}, \dots, x_{i - 1}, x_{i + 1}, \dots x_{n_{p}} \sim N (m_{i} - \sum_{j \neq i} H_{i j} (x_{j} - m_{j}), H_{i i}^{- 1}),$ (10) where $H_{i j}$ , $i, j = 1, \dots, n_{p}$ are $n_{d} \times n_{d}$ dimensional sub-matrices of the posterior precision matrix $H$ and $m_{i}$ are blocks of the posterior mean $m$ . Given a sample $x^{k - 1}$ , the Gibbs sampler generates a new sample $x^{k}$ , by cycling through the $n_{p}$ blocks and sampling the conditional distributions (11) $x_{i}^{k} \sim N (m_{i} - (\sum_{j > i} H_{i j} (x_{j}^{k - 1} - m_{j}) + \sum_{j < i} H_{i j} (x_{j}^{k} - m_{j})), H_{i i}^{- 1}) .$ (11) Note that subscript denotes the block index and superscript denotes the iteration number in the Markov chain. Thus, the Gibbs sampler uses the current iterate $x^{k}$ for blocks that have already been sampled in the current cycle and the previous iterate, $x^{k - 1}$ , for blocks that have not yet been updated. The full Gibbs sampler is summarized in Algorithm 1.

3.2. Efficient blocking via sub-images

How the image $X$ is vectorized is arbitrary in theory, but important in practice, because it defines the structure of the blurring matrix $A$ , the prior precision $L$ and, thus, the posterior precision $H$ . For example, the Laplacian $L$ is sparse for any method of constructing $x$ from $X$ , but it is banded if $x$ is a column stack. Thus the way the vector $x$ is defined from the image $X$ determines the sparsity patterns of the matrices $A$ , $L$ , and $H$ . The sparsity pattern of $H$ is important for the Gibbs sampler, because $H$ encodes conditional independencies among the blocks, which in turn defines the mixing properties of the Markov chain [Citation23, Citation24].

Rather than using the column stack, we define the vector $x$ based on column stacks of sub-images. Specifically, we partition the image $X$ into a set of $n_{p} = m_{B} \cdot n_{B}$ sub-images $X_{i}$ , each of size $m_{d} \times n_{d}$ , (12) $X = [\begin{matrix} X_{1} & X_{1 + m_{B}} & \dots & X_{1 + m_{B} (n_{B} - 1)} \\ X_{2} & X_{2 + m_{B}} & \dots & X_{2 + m_{B} (n_{B} - 1)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ X_{m_{B}} & X_{2 m_{B}} & \dots & X_{m_{B} n_{B}} \end{matrix}] .$ (12) The N-dimensional vector $x$ is now formed by column stacks of the sub-images. It is assumed that the sub-images are of equal size and that they are indexed top to bottom, left to right, but neither is critical.

To see why a blocking based on sub-images is advantageous in Gibbs sampling, recall that blurring and the Laplacian prior precision define ‘local’ actions – only pixels that are spatially nearby are blurred or used for computing derivatives. The posterior precision matrix $H$ is defined in terms of the blurring and prior precision matrix (see (Equation6(6) $\begin{aligned} H = λ A^{⊤} A + δ L, \end{aligned}$ (6) )) and, thus also defines local actions. Because a precision matrix encodes conditional independencies, the local nature of $H$ implies that far-away sub-images are conditionally independent. Since we define the blocks of $x$ by column stacks of the sub-images, this implies that $x_{j}$ is conditionally independent of most other blocks.

3.2.1. Illustration of the sub-image based blocking scheme

The sub-image based blocking scheme can be illustrated by an example of a $100 \times 100$ image $X$ , blurred by a kernel size of size $21 \times 21$ . We partition the image into $m_{B} \times n_{B} = 10 \times 10$ sub-images, each of size $m_{d} \times n_{d} = 10 \times 10$ . Because the sub-image size is greater than half the kernel size minus one ( $10 > 9.5 = 21 / 2 - 1$ ), each sub-image conditionally depends on only eight surrounding sub-images. This means that the sums in the Gibbs sampler (see (Equation11(11) $x_{i}^{k} \sim N (m_{i} - (\sum_{j > i} H_{i j} (x_{j}^{k - 1} - m_{j}) + \sum_{j < i} H_{i j} (x_{j}^{k} - m_{j})), H_{i i}^{- 1}) .$ (11) )) extend over blocks $x_{j}$ , corresponding to these eight sub-images only, as is illustrated in Figure (a).Here, the yellow sub-image corresponds to the block $x_{i}^{k}$ that is currently being updated by the Gibbs sampler during the $k^{th}$ iteration. By our construction, $x_{i}^{k}$ conditionally depends only on the eight neighbouring sub-images highlighted in blue and green, but is conditionally independent of the 91 remaining sub-images. Assuming the indexing in (Equation12(12) $X = [\begin{matrix} X_{1} & X_{1 + m_{B}} & \dots & X_{1 + m_{B} (n_{B} - 1)} \\ X_{2} & X_{2 + m_{B}} & \dots & X_{2 + m_{B} (n_{B} - 1)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ X_{m_{B}} & X_{2 m_{B}} & \dots & X_{m_{B} n_{B}} \end{matrix}] .$ (12) ), the Gibbs sampler cycles from top to bottom, left to right. Thus, the four blue sub-images correspond to blocks that have already been updated during iteration k; the four green sub-images have not been updated during iteration k.

Figure 1. Illustration of the sub-image based blocking scheme and comparison with column stack based blocking. Shown is the domain of a $100 \times 100$ image, blurred by a kernel of size $21 \times 21$ . In panel (a), a block is defined by a column stack of a sub-image. The sub-image corresponding to the block that is currently being updated is highlighted in yellow; this block is conditionally independent of the 91 sub-images whose domains are shown in white, but is conditionally dependent on the eight sub-images highlighted in blue and green. In panel (b), a block is defined by a column of the image. The block, currently being updated, is highlighted in yellow; this block is conditionally independent of the 79 blocks shown in white, but is conditionally dependent on 20 other blocks, highlighted in blue and green. (a) Sub-image based blocking scheme, (b) column stack based blocking scheme.

Figure 1. Illustration of the sub-image based blocking scheme and comparison with column stack based blocking. Shown is the domain of a 100×100 image, blurred by a kernel of size 21×21. In panel (a), a block is defined by a column stack of a sub-image. The sub-image corresponding to the block that is currently being updated is highlighted in yellow; this block is conditionally independent of the 91 sub-images whose domains are shown in white, but is conditionally dependent on the eight sub-images highlighted in blue and green. In panel (b), a block is defined by a column of the image. The block, currently being updated, is highlighted in yellow; this block is conditionally independent of the 79 blocks shown in white, but is conditionally dependent on 20 other blocks, highlighted in blue and green. (a) Sub-image based blocking scheme, (b) column stack based blocking scheme.

We compare the sub-image based blocking scheme with the more common blocking that results from generating the vector $x$ by a column stack of the entire image $X$ . In this case, the block $x_{i}$ , which is a column of the image $X$ , depends conditionally on ten blocks to the left and ten blocks to the right (due to the kernel size of $21 \times 21$ ). This is illustrated in Figure (b), where $x_{i}$ is highlighted in yellow and where the conditionally dependent blocks are highlighted in blue and green. Assuming that we generate $x$ by column-stacking the columns of $X$ left to right, the Gibbs sampler has visited the blue blocks to the left of $x_{i}$ , but has not yet visited the green blocks to the right.

In summary, the blocking scheme based on sub-images requires conditioning each block on only eight nearby blocks of 100 pixels each (800 pixels in total). A blocking scheme based on a column stack of the image requires conditioning each block on 20 neighbouring blocks, each consisting of 100 pixels (2000 pixels in total).

3.3. Optimal and practical sub-image size

The sub-image based blocking scheme offers flexibility in the size of the sub-images in that any sub-image size will lead to a Gibbs sampler that asymptotically (large sample limit) samples the target distribution. Nonetheless, some sub-image sizes may lead to more efficient samplers than others. A practically relevant sub-image size must resolve a trade-off characteristic of many MCMC schemes, that the computations required for computing a single sample in the Markov chain must be balanced with the overall mixing properties of the chain. For example, using the entire image as the single ‘sub-image’, the Gibbs sampler described above amounts to drawing independent samples from the Gaussian posterior distribution (perfect mixing of the chain). For large images, generating samples in this way becomes impractical (large cost per sample). On the other extreme, using a single pixel as a sub-image requires only scalar operations (small cost per sample), but there are large correlations in consecutive steps of the chain (slow mixing of the chain).

An optimal choice of the sub-image size would balance the cost-per-sample with the overall mixing properties of the chain. Such an optimal sub-image size is difficult to anticipate and to describe in generality, because it is problem dependent. Specifically, it is a function of the ratio of the kernel size to the overall image size and the computer used to run the computations. For example, it is not needed to use the proposed Gibbs sampler for images of size $1024 \times 1024$ or less. Given today's typical desktop's or even laptop's computing power, one can draw direct samples (independence sampler, see Iteration (a) in Section 1) for such ‘small’ images.

The Gibbs sampler described above provides the greatest advantages for large images, blurred by large kernels. In this case, the sub-images should be chosen larger than half of the kernel size, so that (13) $m_{d} \geq \frac{m_{a} - 1}{2}, n_{d} \geq \frac{n_{a} - 1}{2} .$ (13) With sub-images of this size (or larger) each block conditionally depends on only eight neighbouring blocks (see also the illustrative example above). The numerical experiments in Section 4 suggest that the sub-image size is not critical for the mixing properties of the chain (as long as (Equation13(13) $m_{d} \geq \frac{m_{a} - 1}{2}, n_{d} \geq \frac{n_{a} - 1}{2} .$ (13) ) is satisfied). We thus define the ‘practical’ sub-image size as the sub-image size that satisfies (Equation13(13) $m_{d} \geq \frac{m_{a} - 1}{2}, n_{d} \geq \frac{n_{a} - 1}{2} .$ (13) ) and minimizes the computation time for generating one sample.

3.4. Dimension-robustness

The goal is to use the Gibbs sampler for large-scale image deblurring, which requires that its mixing properties do not degrade with dimension or, equivalently, image size. This issue was discussed, in a more general setting, in [Citation23], where it was shown that the convergence rate of a Gibbs sampler for Gaussians with block tridiagonal covariance or precision matrices is independent of the dimension of the Gaussian. The dimension independent convergence rate makes the Gibbs sampler a suitable algorithm for high-dimensional problems. The block tridiagonal structure, assumed in [Citation23], indeed occurs naturally in the 1-dimensional model for discrete convolution (see Section 1). Thus, a Gibbs sampler has a dimension independent convergence rate when applied to 1D deblurring problems, but the 2D nature of image deblurring violates the assumption of a block tridiagonal precision matrix. Thus, previous results about dimension independent convergence rates cannot be directly transferred to our case of interest.

We do not attempt to extend the proof in [Citation23] to a 2D image deblurring problem, but rather demonstrate numerically that the mixing properties of the Gibbs sampler, with a carefully defined sub-image size, do not degrade with dimension. In the language of [Citation25], we thus show that the algorithm is ‘dimension-robust '. The fact that we have no formal proof of dimension independence can be interpreted as another instance of ‘bad theory for good algorithms’, as reported in [Citation25], where the case is made for practical, high-dimensional sampling algorithms being intrinsically difficult to analyse rigorously.

3.5. Matrix-based and matrix-free implementations

Recall that the sub-image based blocking scheme means that $x$ and its blocks $x_{i}$ , $i = 1, \dots, n_{p}$ , are defined by column stacks of the sub-images $X_{i}$ . This construction of $x$ determines the construction of the blurring matrix $A$ and the prior precision matrix $L$ , which in turn determine the posterior precision matrix $H$ and its sub-matrices $H_{i j}$ , $i, j = 1, \dots, n_{p}$ . Given the matrices $A$ and $L$ – and thus $H$ (see equation (Equation6(6) $\begin{aligned} H = λ A^{⊤} A + δ L, \end{aligned}$ (6) )) – the posterior mean can be computed by solving (14) $H m = λ A^{⊤} b,$ (14) for $m$ , e.g. by using the conjugate gradient (CG) method. The definition of $x$ via sub-images thus also defines the blocks of the mean $m = [m_{1}, \dots, m_{n_{p}}]$ . Assuming that the sub-images are large enough so that (Equation13(13) $m_{d} \geq \frac{m_{a} - 1}{2}, n_{d} \geq \frac{n_{a} - 1}{2} .$ (13) ) is satisfied, the update equation for the $i^{th}$ block (see (Equation11(11) $x_{i}^{k} \sim N (m_{i} - (\sum_{j > i} H_{i j} (x_{j}^{k - 1} - m_{j}) + \sum_{j < i} H_{i j} (x_{j}^{k} - m_{j})), H_{i i}^{- 1}) .$ (11) )) becomes (15) $x_{i}^{k} \sim N (m_{i} - (\sum_{j \in S_{post}} H_{i j} (x_{j}^{k - 1} - m_{j}) + \sum_{j \in S_{pre}} H_{i j} (x_{j}^{k} - m_{j})), H_{i i}^{- 1}),$ (15) where the index sets (16) $\begin{aligned} S_{pre} & = {i - m_{B} - 1, i - m_{B}, i - m_{B} + 1, i - 1} \cap {1, \dots, n_{p}} \end{aligned}$ (16) (17) $\begin{aligned} S_{post} & = {i + 1, i + m_{B} - 1, i + m_{B}, i + m_{B} + 1} \cap {1, \dots, n_{p}} . \end{aligned}$ (17) define the blocks corresponding to the eight neighbouring sub-images. Here, $S_{pre}$ refers to sub-images that have already been updated during iteration k, and $S_{post}$ refers to sub-images that have not been updated in iteration k. We can implement drawing samples from this Gaussian by (18) $x_{i}^{k} = m_{i} + H_{i i}^{- 1} [H_{i i}^{⊤ / 2} z - (\sum_{j \in S_{post}} H_{i j} (x_{j}^{k - 1} - m_{j}) + \sum_{j \in S_{pre}} H_{i j} (x_{j}^{k} - m_{j}))],$ (18) where $z \sim N (0, I)$ . For small images, one can construct $A$ , $L$ and $H$ and compute $x_{i}^{k}$ via (Equation18(18) $x_{i}^{k} = m_{i} + H_{i i}^{- 1} [H_{i i}^{⊤ / 2} z - (\sum_{j \in S_{post}} H_{i j} (x_{j}^{k - 1} - m_{j}) + \sum_{j \in S_{pre}} H_{i j} (x_{j}^{k} - m_{j}))],$ (18) ), implementing the inversion of $H_{i i}$ by CG or similar algorithms.

For large images, constructing the posterior precision matrix $H$ or its sub-matrices $H_{i j}$ can be cumbersome and memory intensive. Instead of explicitly building the sub-matrices, we briefly describe how to exploit structure in the operators $A$ and $L$ to construct functions that perform the actions of the sub-matrices $H_{i j}$ on sub-images, without assembling $H$ or its sub-matrices $H_{i j}$ .

There are three separate computational elements in (Equation18(18) $x_{i}^{k} = m_{i} + H_{i i}^{- 1} [H_{i i}^{⊤ / 2} z - (\sum_{j \in S_{post}} H_{i j} (x_{j}^{k - 1} - m_{j}) + \sum_{j \in S_{pre}} H_{i j} (x_{j}^{k} - m_{j}))],$ (18) ), for which matrix-free implementations should be used:

Performing a back-solve to compute the action of the inverse of the sub-matrices $H_{i i}$ , i.e. build a function such that (19) $f_{diag} (y, i) = H_{i i}^{- 1} y .$ (19)
Compute the pre- and post-sums, i.e. build functions such that (20) $f_{pre} (y, i) = \sum_{j \in S_{pre}} H_{i j} y_{j} and f_{post} (y, i) = \sum_{j \in S_{post}} H_{i j} y_{j} .$ (20)
Generate a vector distributed as $N (0, H_{i i})$ .

Each of these functions requires a representation of the action of sub-matrices $H_{i j}$ , which are defined by the blurring matrix $A$ and the prior precision $L$ (see (Equation6(6) $\begin{aligned} H = λ A^{⊤} A + δ L, \end{aligned}$ (6) )). Given the sub-image blocking scheme $x = [x_{1}, \dots, x_{n_{p}}]$ , the matrices $A$ and $L$ consist of $n_{p} \times n_{p}$ blocks, each of size $m_{d} n_{d} \times m_{d} n_{d}$ . The sub-matrices $H_{i j}$ thus can be expressed in terms of sub-matrices of $A$ and $L$ : (21) $H_{i j} = λ {(A_{:, i})}^{⊤} A_{:, j} + δ L_{i j}$ (21) where (22) $A_{:, j} = {[A_{1, j}^{⊤} \dots A_{n_{p}, j}^{⊤}]}^{⊤} .$ (22) An implementation of the action of $H_{i j}$ thus requires that we define the functions (23) $\begin{aligned} f_{A_{:, j}} (y) & = A_{:, j} y, \end{aligned}$ (23) (24) $\begin{aligned} f_{(A_{:, i})^{⊤}} (y) & = (A_{:, i})^{⊤} y, \end{aligned}$ (24) (25) $\begin{aligned} f_{L_{i j}} (y) & = L_{i j} y . \end{aligned}$ (25) We provide a brief summary of how to construct these functions here, with more details in the appendix. We refer to [Citation41] for a description of all delicate details, especially with respect to a careful treatment of the data-driven boundary conditions.

In short, the function (Equation23(23) $\begin{aligned} f_{A_{:, j}} (y) & = A_{:, j} y, \end{aligned}$ (23) ) is equivalent to the action of the convolution matrix $A$ acting on an image in which only the jth block is nonzero and equal to $x_{j}$ . It is possible to compute just the convolution with the non-zero portion of the image so that $f_{A_{:, j}} (\cdot)$ can be implemented by convolving a small image with a given kernel via FFT. The function $f_{(A_{:, i})^{⊤}} (y)$ in (Equation24(24) $\begin{aligned} f_{(A_{:, i})^{⊤}} (y) & = (A_{:, i})^{⊤} y, \end{aligned}$ (24) ) can be implemented in the same way, because the transpose essentially amounts to flipping the blurring kernel in the vertical and horizontal dimensions. The function $f_{L_{i j}} (y)$ in (Equation25(25) $\begin{aligned} f_{L_{i j}} (y) & = L_{i j} y . \end{aligned}$ (25) ) implements the action of sub-matrices of the negative 2D Laplacian, which is equivalent to convolution with the kernel $[\begin{matrix} 0 & - 1 & 0 \\ - 1 & 4 & - 1 \\ 0 & - 1 & 0 \end{matrix}] .$ We can thus use the same ideas as above to implement (Equation25(25) $\begin{aligned} f_{L_{i j}} (y) & = L_{i j} y . \end{aligned}$ (25) ).

Given the functions (Equation23(23) $\begin{aligned} f_{A_{:, j}} (y) & = A_{:, j} y, \end{aligned}$ (23) )–(Equation25(25) $\begin{aligned} f_{L_{i j}} (y) & = L_{i j} y . \end{aligned}$ (25) ), the action of a sub-matrix $H_{i i}$ on a vector $x_{i}$ can be written as (26) $f_{H_{i i}} (x_{i}) = λ f_{(A_{:, i})^{⊤}} (f_{A_{:, i}} (x_{i})) + δ f_{L_{i i}} (x_{i}),$ (26) and, given (Equation26(26) $f_{H_{i i}} (x_{i}) = λ f_{(A_{:, i})^{⊤}} (f_{A_{:, i}} (x_{i})) + δ f_{L_{i i}} (x_{i}),$ (26) ), we use CG to compute (Equation19(19) $f_{diag} (y, i) = H_{i i}^{- 1} y .$ (19) ). To draw samples $v \sim N (0, H_{i i})$ , note that $v = v_{1} + v_{2}$ , where $v_{1} \sim N (0, λ A_{:, i}^{⊤} A_{:, i})$ and $v_{2} \sim N (0, δ L_{i i})$ are two independent Gaussians. We can sample $v_{1}$ by using the function $f_{(A_{:, i})^{⊤}} (\cdot)$ and writing $v_{2} = f_{(A_{:, i})^{⊤}} (z)$ , where $z \sim N (0, I)$ ; $v_{2}$ can be sampled similarly (see Appendix). Similarly, the pre- and post-sums in (Equation20(20) $f_{pre} (y, i) = \sum_{j \in S_{pre}} H_{i j} y_{j} and f_{post} (y, i) = \sum_{j \in S_{post}} H_{i j} y_{j} .$ (20) ) can be computed by $f_{(A_{:, i})^{⊤}} (\cdot)$ , convolution with a kernel, and slight modifications of the function $f_{L_{i i}} (\cdot)$ . The key idea here is to carefully construct sub-images that are used by these functions in the computation of the pre- and post-sums.

Note that the Gibbs sampler (in matrix-based or matrix free implementation), requires that we compute the posterior mean before we start sampling. This means that we need to solve (Equation14(14) $H m = λ A^{⊤} b,$ (14) ) once, before we start the sampler. This is done by using CG and the functions implementing the action of $H$ in the matrix-free implementation, or by simply solving (Equation7(7) $\begin{aligned} m = λ H^{- 1} A^{⊤} b . \end{aligned}$ (7) ), with given $H$ , in the matrix-based implementation.

Finally, we emphasize again that the Gibbs sampler should only be used for large images (larger than $1024 \times 1024$ pixels), because smaller images can be dealt with by drawing samples directly from the Gaussian defined by Equations (Equation6(6) $\begin{aligned} H = λ A^{⊤} A + δ L, \end{aligned}$ (6) )–(Equation7(7) $\begin{aligned} m = λ H^{- 1} A^{⊤} b . \end{aligned}$ (7) ). Thus, the matrix-based implementation we describe should be viewed as a conceptual tool to understand how the sampler works, rather than a practical algorithm.

4. Applications to X-ray radiography

Within the U.S. Department of Energy's research and development enterprise, X-ray radiography is a common approach to diagnosing dynamic material studies, where in situ measurements systems would be destroyed. One such facility is the Cygnus Dual Axis X-ray system at the Nevada National Security Site. One of the primary applications for such imaging is to understand the evolution of material density during shock compression, which can be modelled as a single-projection tomography for objects that are radially symmetric [Citation3, Citation42, Citation43]. This requires sufficiently high spatial resolution, or more specifically, the ability to precisely measure distances in the imaging plane to the object's axis of symmetry and a complete understanding of the X-ray spectrum and fluence, which is necessary to convert from units of pixel intensity to areal density (g/cm $^{2}$ ) [Citation44–46]. Both of these aspects of the X-ray measurement are degraded by the imaging system blur.

The Cygnus radiographic system produces large images ( $4096 \times 4096$ pixels) and the fundamental difficulty using prior work to deblur such large images has been the inability to sample from the corresponding high-dimensional posterior densities of type (Equation5(5) $p (x | b, λ, δ) \propto \exp (- \frac{λ}{2} {‖ A x - b ‖}^{2} - \frac{δ}{2} {‖ L^{1 / 2} x ‖}^{2}) .$ (5) ) in a computationally tractable manner. Figure shows an example of a radiograph with three calibration objects. The first, marked (a), is a ‘step wedge ', which is a block of dense metal cut into steps of different thicknesses used to characterize system response to the X-ray spectrum and fluence. The second, marked (b) in the figure, consists of concentric cylinders of different materials, which are used to calibrate the density reconstruction model. The third object, marked (c) in the figure, is called an ‘L rolled edge ', which can be used to compute the point spread function (see below).

Figure 2. Multi-calibration target consisting of three calibration objects: (a) the step wedge, (b) the Abel cylinder, and (c) the L-rolled edge.

Within the proposed framework, the deblurring problem is defined by (i) the convolution matrix $A$ , (ii) the precision parameters, δ and λ, and (iii) the input data, $b$ . The input data is simply the radiograph. The convolution matrix $A$ is defined by the deblur kernel, which can be obtained from an appropriate point spread function (PSF). The PSF can be computed from the L rolled edge in Figure [Citation5, Citation14, Citation15], and we employ the non-parametric method outlined in [Citation15]. Essentially, by modelling the transition from black to white across the edge as a linear inverse problem, techniques similar to [Citation37] can be used to estimate the PSF's radial profile pixel-by-pixel. This technique has been employed on Cygnus data [Citation15], and that work has been tailored to capture the main components of the Cygnus imaging system's blur. We set small elements in the kernel to zero (by thresholding), to obtain a sparse structure in $A$ that has a blurring kernel of size $33 \times 33$ .

The precision parameter λ describes a pixel-wise noise variance, which is calculated as an average variance across the image in Figure . Standard deviations of $1.1 \times 10^{- 2}$ and $3.4 \times 10^{- 3}$ were computed for white and black portions of the image, respectively, so we choose a conservative estimate and set $λ = 9 \times 10^{3}$ . Motivated by numerical experiments with small portions of the image, we set $δ = 2.5 \times 10^{- 3} \cdot λ$ , based on estimates computed from the Discrepancy Principle [Citation8, Citation47] and the Unbiased Predictive Risk Estimator [Citation48–50].

4.1. Large-scale image deblurring

The Gibbs sampler described above with a sub-image size of $512 \times 512$ (i.e. $8 \times 8 = 64$ sub-images) is applied to deblur the radiograph of size $4096 \times 4096$ , shown in Figure . All computations are performed on one node of the University of Arizona's High Performance Computing (HPC) cluster, which consists of a 2.3 GHz Dual 14-core processor with 192 GBs of memory. On this machine, generating one sample with our code, which does not make use of parallelism, takes about 12 minutes.

Recall that we need to compute the posterior mean to start sampling (see Section 3.5). It is then natural to initialize the Gibbs sampler with the posterior mean which keeps the burn-in period short. See [Citation41] for additional details about the burn-in.

Integrated auto correlation time (IACT) is used to assess the mixing properties of the Markov chain generated by the Gibbs sampler [Citation51]. The IACT defines an effective sample size of a Markov chain by (27) $N_{eff} \propto N_{e} / IACT,$ (27) where $N_{e}$ is the length of the Markov chain. Thus, the number of samples required for solving a deblurring problem at a specified accuracy (number of effective samples) is proportional to the IACT. The IACT is computed using techniques described in [Citation52], based on 1000 samples from the posterior. At each pixel, the computed values are between 1 and 10, with an average value of 1.09. The small average IACT indicates that the chain is well-mixed and that nearly all samples are effective samples.

Figure illustrates the results and shows the mean of the 1000 samples along with two zooms, labelled (a) and (b). The mean of the posterior has (i) sharper step edges and a (ii) sharper edge of the concentric cylinders, while transitions within the cylinder remain smooth as expected and desired.

Figure 3. Mean of 1000 samples of the Gibbs sampler. The regions enclosed by red rectangles are shown enlarged on the right.

4.2. Dimension-robustness of the Gibbs sampler

The dimension-robustness of the Gibbs sampler is demonstrated by studying how IACT depends on dimension (putting issues of reaching stationarity to the side). The dimension of the problem is equal to the number of pixels and is thus proportional to the image size. Specifically, IACT defines the number of samples required for solving a deblurring problem at a specified accuracy, or, equivalently, the number of effective samples, via (Equation27(27) $N_{eff} \propto N_{e} / IACT,$ (27) ). This implies that the number of samples required to characterize the posterior is nearly independent of the size of the image when the IACT is also nearly independent of the size of the image. A dimension-robust algorithm thus must have the property that IACT has little dependence on image size.

To demonstrate dimension-robustness of the Gibbs sampler, we cut smaller images, ranging from $256 \times 256$ to $2048 \times 2048$ , out of the radiograph. The smaller images are centred on a corner of the step wedge to ensure at least one interesting feature exists in each image. We apply the Gibbs sampler, with sub-image size $128 \times 128$ , to each image and generate 1000 samples. Computing IACT for each image then indicates the scaling of IACT with image size. The results are summarized in Table , which lists the mean and maximum (over all pixels) IACT as a function of image size. As the size of the image grows, the IACT has a nearly constant mean IACT and only a modest increase in the maximum IACT. Our numerical experiments thus support dimension-robustness of the Gibbs sampler.

Table 1. Pixel-wise mean and maximum IACTs of the Gibbs sampler as a function of image size.

Download CSV Display Table

The dimension-robustness is also nearly independent of the sub-image size, provided the sub-images are larger than the minimum defined in (Equation13(13) $m_{d} \geq \frac{m_{a} - 1}{2}, n_{d} \geq \frac{n_{a} - 1}{2} .$ (13) ). We demonstrate this property of the Gibbs sampler in numerical experiments in which we vary the image size (from $256 \times 256$ to $2048 \times 2048$ , as above) and the sub-image size (from $256 \times 256$ to $1024 \times 1024$ ). The results are summarized in Table , which shows the mean and maximum IACT for the various image and sub-image sizes. Both the mean and max IACT are nearly constant for all image and sub-image considered.

Table 2. Average and maximum IACT (in parentheses) as a function of the image size and sub-image size.

Display Table

4.3. Practical sub-image size

Given the near-independence of image size on the convergence-efficiency of the Gibbs sampler, one may try to find a ‘practical’ sub-image size that leads to the smallest wall clock time required to generate one sample. This was done for the $4096 \times 4096$ radiograph in Figure by trying sub-images of size $128 \times 128$ (resulting in a $32 \times 32$ sub-images) to $2048 \times 2048$ (resulting in a $2 \times 2$ block of sub-images). Three of the sub-image sizes we tried are illustrated in Figure . The average the wall clock time required for a single posterior sample was computed by generating five samples and averaging, and the results are plotted in Figure . The lowest wall clock time occurs for sub-images of size $512 \times 512$ , which is why we used this sub-image size in Section 4.1.

Figure 4. Three different sub-image partitions for an image of size $4096 \times 4096$ . The sub-images have size $256 \times 256$ (left), $512 \times 512$ (center) and $1024 \times 1024$ (right).

Figure 4. Three different sub-image partitions for an image of size 4096×4096. The sub-images have size 256×256 (left), 512×512 (center) and 1024×1024 (right).

Figure also shows average wall clock times per sample as a function of sub-image size for images of size $1024 \times 1024$ and $2048 \times 2048$ . For the images with $1024 \times 1024$ and $2048 \times 2048$ pixels, we use the same practical sub-image size ( $512 \times 512$ ) as for the $4096 \times 4096$ image. The smaller image of size $1024 \times 1024$ , however, behaves differently in the sense that the practical sub-image size is equal to the full image size. Thus, there is no advantage in using the Gibbs sampler on small images because drawing samples directly, i.e. setting the sub-image size equal to the image size, is feasible and in fact the fastest method. This is in line with our assessment that the proposed Gibbs sampler provides significant computational gains only for large images. Finally, we emphasize that the practical sub-image size also depends on the computing platform used, and a parallelized version of the proposed approach may lead to a different practical sub-image size.

Figure 5. Average wall clock time per sample as a function of sub-image size. Plus signs correspond to an image size of $4096 \times 4096$ ; Xs correspond to an image size of $2048 \times 2048$ ; squares correspond to an image size of $1024 \times 1024$ .

Figure 5. Average wall clock time per sample as a function of sub-image size. Plus signs correspond to an image size of 4096×4096; Xs correspond to an image size of 2048×2048; squares correspond to an image size of 1024×1024.

5. Conclusions

The blocking scheme used to group together variables in a Gibbs sampler has a large effect on its efficiency, in particular in high-dimensional problems. We revisited these ideas in the context of large-scale image deblurring problems, which require drawing samples from Gaussian posterior distributions in $10^{7}$ unknowns.

We constructed a blocking scheme based on sub-images, which respects the 2D nature of imaging problems, and that leads to a sparse and highly structured posterior precision matrix. The Gibbs sampler then naturally exploits this matrix structure during sampling, which makes it dimension-robust, i.e., generating a single sample is feasible and an associated integrated autocorrelation time is nearly independent of the size of the image (the dimension of the problem) in numerical experiments. The dimension-robustness makes the sampler a practical tool for large-scale image deblurring problems.

We demonstrated the applicability of our ideas by implementing and testing the Gibbs sampler on images of size up to $4096 \times 4096$ pixels, taken by the Cygnus Dual Beam Radiographic Facility at the Nevada National Security Site. Our implementation is ‘matrix-free’, which is essential, because building and storing precision matrices at the large scale is impractical. Numerical tests demonstrate that the mean pixel-wise integrated autocorrelation time is independent of the size of the image. We also investigated a practical sub-image size that leads to minimal wall clock time per sample on a given computational platform. With a practically chosen sub-image size, the sampler generates $O (100)$ samples in under one day in a $4096 \times 4096$ deblurring problem using only modest computational resources. Moreover, the mean reconstructions, generated by the sampler, display sharper features than the blurred image, while still preserving smooth features in the image.

Acknowledgments

The authors would like to thank Johnathan Bardsley for helpful discussions on the algorithm and its applications.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This manuscript has been authored by Mission Support and Test Services, LLC, under Contract No. DE-NA0003624 with the U.S. Department of Energy and supported by the Site-Directed Research and Development Program. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The U.S. Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government. DOE/NV/03624–0662. This manuscript was supported by (Pacific Northwest National Labs PNNL-SA-150477).

Notes

1 For the sake of simplicity in indexing, we assume

n_{b}

is odd. This is not required, and the formulation holds when

n_{b}

is even.

References

Halls BR, Roy S, Gord JR, et al. Quantitative imaging of single-shot liquid distributions in sprays using broadband flash X-ray radiography. Int J Multiphase Flow. 2016;87:241–249.
Google Scholar
Hanson KM, Cunningham GS. The Bayes inference engine. In: Maximum entropy and Bayesian methods. Dordrecht, The Netherlands: Kluwer Academic; 1996. p. 125–134.
Google Scholar
Howard M, Fowler M, Luttman A, et al. Bayesian Abel inversion in quantitative X-ray radiography. SIAM J Sci Comput. 2016;38:B396–B413.
Google Scholar
Maire E, Withers PJ. Quantitative X-ray tomography. Int Mater Rev. 2014;59:1–43.
Google Scholar
Fowler M, Howard M, Luttman A, et al. A stochastic approach to quantifying the blur with uncertainty estimation for high-energy X-ray imaging systems. Inverse Probl Sci Eng. 2016;24:353–371.
Google Scholar
Nagesh SVS, Rana R, Russ M, Ionita CN, Bednarek DR, Rudin S. Focal spot deblurring for high resolution direct conversion x-ray detectors. Proc SPIE Int Soc Opt Eng. 2016;9783:97833R.
Google Scholar
von Wittenau AES, Logan CM, Aufderheide MB, et al. Blurring artifacts in megavoltage radiography with a flat-panel imaging system: comparison of Monte Carlo simulations with measurements. Med Phys. 2002;29:2559–2570.
Google Scholar
Hansen PC, Nagy JG, O'Leary DP. Deblurring images: matrices, spectra, and filtering. Philadelphia, PA: SIAM; 2006.
Google Scholar
Bardsley JM. Computational uncertainty quantification for inverse problems. Philadelphia (PA): SIAM; 2018.
Google Scholar
Bardsley JM, Luttman A. Dealing with boundary artifacts in MCMC-based deconvolution. Linear Algebra Appl. 2015;473:339–358.
Google Scholar
Bardsley JM, Luttman A. A Metropolis-Hastings method for linear inverse problems with Poisson likelihood and Gaussian prior. Int J Uncertain Quantif. 2016;6:35–55.
Google Scholar
Fox C, Norton RA. Fast sampling in a linear-Gaussian inverse problem. SIAM/ASA J Uncertain Quantif. 2016;4:1191–1218.
Google Scholar
Fox C, Parker A. Accelerated Gibbs sampling of normal distributions using matrix splittings and polynomials. Bernoulli. 2017;23:3711–3743.
Google Scholar
Howard M, Fowler M, Luttman A. Sampling-based uncertainty quantification in deconvolution of X-ray radiographs. J Comput Appl Math. 2014;270:43–51.
Google Scholar
Joyce KT, Bardsley JM, Luttman A. Point spread function estimation in X-ray imaging with partially collapsed Gibbs sampling. SIAM J Sci Comput. 2018;40:B766–B787.
Google Scholar
Wang Z, Bardsley J, Solonen A, et al. Bayesian inverse problems with ℓ1 priors: a randomize-then-optimize approach. SIAM J Sci Comput. 2017;39:S140–S166.
Google Scholar
Parker A, Pitts B, Lorenz L, et al. Polynomial accelerated solutions to a large gaussian model for imaging biofilms: in theory and finite precision. J Am Stat Assoc. 2018;113:1431–1442.
Google Scholar
Chen J, Anitescu M, Saad Y. Computing f(a)b via least squares polynomial approximations. SIAM J Sci Comput. 2011;33:195–222.
Google Scholar
Chow E, Saad Y. Preconditioned krylov subspace methods for sampling multivariate gaussian distributions. SIAM J Sci Comput. 2014;36:A588–A608.
Google Scholar
Parker A, Fox C. Sampling gaussian distributions in krylov spaces with conjugate gradients. SIAM J Sci Comput. 2012;34:B312–B334.
Google Scholar
Gilks W, Richardson S, Spiegelhalter D. Markov chain Monte Carlo in practice. Boca Raton, FL: Springer; 1996.
Google Scholar
Rubinstein RY, Kroese DP. Simulation and the Monte Carlo method. 3rd ed. Hoboken, NJ: Wiley; 2017.
Google Scholar
Morzfeld M, Tong X, Marzouk YM. Localization for MCMC: sampling high-dimensional posterior distributions with local structure. J Comput Phys. 2018;380:1–28.
Google Scholar
Roberts GO, Sahu S. Updating schemes, correlation structure, blocking and parameterization for the Gibbs sampler. J R Stat Soc Ser B. 1997;59:291–317.
Google Scholar
Chen V, Dunlop MM, Papaspiliopoulos O, et al. Dimension-robust MCMC in Bayesian inverse problems. Submitted; 2018.
Google Scholar
Beskos A, Roberts G, Stuart A. Optimal scalings for local Metropolis-Hastings chains on nonproduct targets in high dimensions. Ann Appl Probab. 2009;19:863–898.
Google Scholar
Roberts G, Gelman A, Gilks W. Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann Appl Probab. 1997;7:110–120.
Google Scholar
Roberts G, Rosenthal J. Optimal scaling of discrete approximations to Langevin diffusions. J R Stat Soc, Ser B (Stat Methodol). 1998;60:255–268.
Google Scholar
Buccini A, Donatelli M, Reichel L. Iterated Tikhonov regularization with a general penalty term. Numer Linear Algebra Appl. 2017;24:1–19.
Google Scholar
Chen H, Wang C, Song Y, et al. Split Bregmanized anisotropic total variation model for image deblurring. J Vis Commun Image Represent. 2015;31:282–293.
Google Scholar
Jiao Y, Jin Q, Lu X, et al. Alternating direction method of multipliers for linear inverse problems. SIAM J Numer Anal. 2016;54:2114–2137.
Google Scholar
Ma L, Xu L, Zeng T. Low rank prior and total variation regularization for image deblurring. J Sci Comput. 2017;70:1336–1357.
Google Scholar
Tao S, Dong W, Xu Z, et al. Fast total variation deconvolution for blurred image contaminated by Poisson noise. J Vis Commun Image Represent. 2016;38:582–594.
Google Scholar
Xu J, Chang HB, Qin J. Domain decomposition method for image deblurring. J Comput Appl Math. 2014;271:401–414.
Google Scholar
Bertero M, Boccacci P. A simple method for the reduction of boundary effects in the Richardson-Lucy approach to image deconvolution. Astron Astrophys. 2005;437:369–374.
Google Scholar
Vio R, Bardsley JM, Donatelli M, et al. Dealing with edge effects in least-squares image deconvolution problems. Astron Astrophys. 2005;442:397–403.
Google Scholar
Bardsley JM. Gaussian Markov random field priors for inverse problems. Inverse Probl Imaging. 2013;7:397–416.
Google Scholar
Rue H, Held L. Gaussian Markov random fields: theory and applications. New York, NY: Chapman & Hall/CRC; 2005.
Google Scholar
Gelman A, Carlin JB, Stern HS, et al. Bayesian data analysis. 3rd ed. Boca Raton, FL: Chapman and Hall CRC Press; 2013.
Google Scholar
Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell. 1984;PAMI-6:721–741.
Google Scholar
Adams J. Scalable block-Gibbs sampling for image deblurring in X-ray radiography [PhD dissertation]. Tucson (Arizona): University of Arizona; 2019.
Google Scholar
Asaki TJ, Chartrand R, Vixie KR, et al. Abel inversion using total-variation regularization. Inverse Probl. 2005;21:1895–1903.
Google Scholar
Asaki TJ, Campbell PR, Chartrand R, et al. Abel inversion using total variation regularization: applications. Inverse Probl Sci Eng. 2006;14:873–885.
Google Scholar
Davis G, Jain N, Elliott J. A modelling approach to beam hardening correction. Proc SPIE Int Soc Opt Eng. 2008;7078:70781E.
Google Scholar
Kwan TJT, Berninger M, Snell C, et al. Simulation of the cygnus rod-pinch diode using the radiographic chain model. IEEE Trans Plasma Sci. 2009;37:530–537.
Google Scholar
Seeman HE, Roth B. New stepped wedges for radiography. Acta radiol. 1960;53:215–226.
Google Scholar
Engl H, Hanke M, Neubauer A. Regularization for inverse problems. Dordrecht, The Netherlands: Kluwer Academic; 1996.
Google Scholar
Golub GH, Heath M, Wahba G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics. 1979;21:215–223.
Google Scholar
Renaut RA, Helmstetter AW, Vatankhah S. Unbiased predictive risk estimation of the tikhonov regularization parameter: convergence with increasing rank approximations of the singular value decomposition. BIT Numer Math. 2019;59:1031–1061.
Google Scholar
Vogel CR. Computational methods for inverse problems. Philadelphia (PA): SIAM; 2002.
Google Scholar
Sokal A. Monte Carlo methods in statistical mechanics: foundations and new algorithms. In: DeWitt-Morette C, Cartier P, Folacci A, editors. Functional Integration. NATO ASI Series (Series B: Physics). Vol. 361. Boston (MA): Springer; 1998.
Google Scholar
Wolff U. Monte Carlo errors with less errors. Comput Phys Commun. 2004;156:145–153.
Google Scholar

Appendix. Summary of notation

We provide a list of frequently used matrices and vectors and their dimensions in Table .

Table A1. List of frequently used matrices and vectors.

Display Table

A.1. Matrix-free implementation

We present additional details of the matrix-free implementation of the Gibbs sampler (Section 3.5) and refer to [Citation41] for additional discussion in particular about a careful treatment of the boundaries.

We first discuss the basic functions $f_{A_{:, j}} (\cdot)$ , $f_{(A_{:, i})^{⊤}} (\cdot)$ , $f_{L_{i j}} (\cdot)$ and $f_{L_{i i}^{⊤ / 2}} (\cdot)$ in (Equation23(23) $\begin{aligned} f_{A_{:, j}} (y) & = A_{:, j} y, \end{aligned}$ (23) )–(Equation25(25) $\begin{aligned} f_{L_{i j}} (y) & = L_{i j} y . \end{aligned}$ (25) ). The function (A1) $f_{A_{:, j}} (y) = A_{:, j} y,$ (A1) implements the action of the matrix $A_{:, j}$ . This action is equivalent to the action of the convolution matrix $A$ , acting on an image with in which only the jth block is nonzero and equal to $x_{j}$ : (A2) $A_{:, j} x_{j} = A [\begin{matrix} 0 \\ ⋮ \\ 0 \\ x_{j} \\ 0 \\ ⋮ \\ 0 \end{matrix}] .$ (A2) Thus, we can write (A3) $f_{A_{:, j}} (x_{j}) = f_{A} ({[\begin{matrix} 0 & 0 & 0 \\ 0 & X_{j} & 0 \\ 0 & 0 & 0 \end{matrix}]}_{m_{x} \times n_{x}}),$ (A3) where $f_{A}$ means convolution with the matrix $A$ , and where the $0$ blocks are appropriately sized so that the $X_{j}$ block is in the jth position. It is possible to compute just the non-zero portion of $f_{A}$ and keep track of its location within the image. Implementing $f_{A}$ in this way means that the zero-padding can be reduced to the size of the kernel, rather than the size of the image. In summary, we implement $f_{A_{:, j}} (\cdot)$ by convolving a small image with a given kernel, which can be implemented efficiently via FFT.

The function (A4) $f_{(A_{:, i})^{⊤}} (y) = (A_{:, i})^{⊤} y,$ (A4) can be implemented similarly. Recall that $A$ can be written as the product $D \hat{A}$ . Here $\hat{A} \in R^{N \times N}$ is the convolution matrix with data-driven boundary conditions, and $D \in R^{M \times N}$ is the cropping matrix. The transpose, $A^{⊤} = {\hat{A}}^{⊤} D^{⊤}$ , first extends the image from size M to size N, and then applies the convolution ${\hat{A}}^{⊤}$ , which is the same as the convolution with the blurring kernel $a$ flipped in the vertical and horizontal directions.

The function (A5) $f_{L_{i j}} (y) = L_{i j} y,$ (A5) implements the action of sub-matrices of the negative 2D Laplacian, which is equivalent to convolution with the kernel $[\begin{matrix} 0 & - 1 & 0 \\ - 1 & 4 & - 1 \\ 0 & - 1 & 0 \end{matrix}] .$ We can thus use the same ideas as above to implement $f_{L_{i j}} (\cdot)$ .

Sampling from $v_{2} \sim N (0, δ L_{i i})$ relies on writing the discrete Laplacian as (A6) $L = D_{h}^{⊤} D_{h} + D_{v}^{⊤} D_{v},$ (A6) where $D_{v} = D_{p} \otimes I_{m_{x}}$ and $D_{h} = I_{n_{x}} \otimes D_{p}$ , and where ⊗ represents the Kronecker product [Citation37]. Here, $D_{p}$ is the forward difference operator with periodic boundary conditions: (A7) $D_{p} = [\begin{matrix} - 1 & 1 & 0 & \dots & 0 \\ 0 & - 1 & 1 & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & ⋱ & 0 \\ 0 & ⋱ & ⋱ & - 1 & 1 \\ 1 & 0 & \dots & 0 & - 1 \end{matrix}],$ (A7) but similar expressions can be derived for zero boundary conditions [Citation41]. Since $\begin{aligned} L_{i i} = (D_{h}^{⊤} D_{h} + D_{v}^{⊤} D_{v})_{i i} = (D_{h} (:, i))^{⊤} D_{h} (:, i) + (D_{v} (:, i))^{⊤} D_{v} (:, i), \end{aligned}$ sampling $v_{2} \sim N (0, δ L_{i i})$ can be done by writing functions for $(D_{h} (:, i))^{⊤}$ and $(D_{v} (:, i))^{⊤}$ , and constructing $v_{2}$ as a sum of two independent Gaussians, $v_{2} = v_{2, h} + v_{2, v}$ , where $v_{2, h} \sim N (0, (D_{h} (:, i))^{⊤} D_{h} (:, i))$ and $v_{2, v} \sim N (0, (D_{v} (:, i))^{⊤} D_{v} (:, i))$ . As before, we can sample $v_{2, h} = (D_{h} (:, i))^{⊤} z$ , where $z \sim N (0, I)$ , and $v_{2, v} = (D_{v} (:, i))^{⊤} w$ , where $w \sim N (0, I)$ , $z$ , $w$ independent. Writing functions for matrix multiplication with $(D_{h} (:, i))^{⊤}$ and $(D_{v} (:, i))^{⊤}$ is straightforward because the operations (forward differencing and Kronecker products) are elementary (see [Citation41] for details).

We finally comment on the efficient implementation of the pre- and post-sums in (Equation20(20) $f_{pre} (y, i) = \sum_{j \in S_{pre}} H_{i j} y_{j} and f_{post} (y, i) = \sum_{j \in S_{post}} H_{i j} y_{j} .$ (20) ), which implement the contributions of the action $H$ of portions of the neighbouring sub-images to the ith sub-image (see [Citation41] for full details, especially with respect to the treatment of the boundaries). As above, an efficient implementation of these sums exploits the fact that the sum $\sum_{j \in S} H_{i j} y_{j}$ can be computed by carefully zero-padding the input, and implementing a functional form of $H_{i, :} = λ (A_{:, i})^{⊤} A + δ L_{i, :}$ (see [Citation41] for details).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

A blocking scheme for dimension-robust Gibbs sampling in large-scale image deblurring

ABSTRACT