Efficient and Convergent Algorithms for High-Fidelity Hyperspectral Image Fusion
Abstract
Hyperspectral (HS) imaging refers to acquiring images with hundreds of bands corresponding to different wavelengths of light. HS imaging has a wide range of applications such as remote sensing, industrial inspection, environmental monitoring, etc. A fundamental consideration with multiband sensors is that the amount of incident energy is limited and this creates an intrinsic tradeoff between spatial resolution and the number of bands---current optical sensors can either generate images with high resolution but a small number of bands or images with a large number of bands but reduced resolution. For example, HS images have hundreds of bands but low spatial resolution, whereas the opposite is true for multispectral (MS) images. An extreme case is a panchromatic (PAN) image with very high spatial resolution but just a single band. Image fusion refers to techniques where multiband images with high spatial resolution are synthetically generated using image processing algorithms. It includes pansharpening (MS+PAN), hyperspectral sharpening (HS+PAN), and HS-MS fusion (HS+MS). Reconstructing a fused image from the observed images is ill-posed and needs regularization. Diverse regularization methods have been proposed over the years for general imaging problems, many of which perform very well for fusion. This includes vector total variation, sparsity and dictionary-based penalties, generalized Gaussian- and GMM-based priors, etc. This thesis proposes novel regularization models and algorithms that can outperform state-of-the-art image fusion techniques. We can broadly group these into two classes---explicit and implicit regularization.
Explicit regularization refers to the design of hand-crafted penalty functions that impose desirable properties (e.g., smoothness) on the reconstruction; this is used along with the observed data for fusion. We propose a convex regularizer that is motivated by nonlocal patch-based methods for image restoration. Our regularizer accounts for long-distance correlations in hyperspectral images, considers patch variation for capturing texture information, and uses the higher resolution image for guiding the fusion process. Unlike local pixel-based methods, where variations along just horizontal and vertical directions are penalized, we use a wider search window in terms of nonlocality and directionality. This is shown to yield state-of-the-art results. The catch is that the resulting optimization problem is non-differentiable and we cannot use simple gradient-based algorithms. However, we show that by expressing patch variation as filtering operations and judiciously splitting the original variables and introducing latent variables, we develop a provably convergent iterative algorithm, where the subproblems can be solved efficiently using FFT-based convolution and soft-thresholding.
In the implicit approach, we rely on a recent paradigm known as plug-and-play (PnP) regularization, where powerful off-the-shelf denoisers are used for regularization purposes. While this has been shown to give state-of-the-art results for general restoration tasks, it has not so much been explored for fusion. In fact, we faced few technical challenges in applying PnP for hyperspectral fusion. Firstly, existing denoisers are slow when applied to multiband images and we need to apply such denoisers several times with the PnP framework. Secondly, convergence is generally not guaranteed for PnP regularization since the mechanism is ad-hoc. Along with efficiency and good denoising performance, we need to come up with a denoiser with specific properties that can guarantee convergence. We proposed a couple of approaches to solve this problem. In the first approach, we have developed a high-dimensional kernel denoiser with low cost yet good denoising performance, which can guarantee PnP convergence. The overall algorithm is fast and competitive with state-of-the-art methods. In the second approach, we leverage the power of deep learning to develop a trained patch denoiser which has a couple of advantages over conventional end-to-end learning:
(1) Unlike end-to-end networks which require excessive ground-truth data for training, we can be trained the denoiser from patches extracted from the observed images. For example, in HS+MS fusion, the MS image captures the same scene and has the same spatial resolution as the target image. We train the denoiser by sampling clean patches from the MS image and corrupting them with noise.
(2) Compared to end-to-end learning, where the training is done with a fixed forward model, our method can be deployed for different forward models. This is possible thanks to the decoupling of the inversion (of the forward model) and denoising steps in PnP.
We use the trained denoiser for PnP regularization and establish convergence of the PnP iterations under a technical assumption that we verify numerically. As far as the reconstruction quality is concerned, our method outperforms state-of-the-art variational and deep-learning fusion techniques.