Advances in High Dynamic Range Imaging Using Deep Learning
Natural scenes have a wide range of brightness, from dark starry nights to bright sunlit beaches. Our human eyes can perceive such a vast range of illumination through various adaptation techniques, thus allowing us to enjoy them. Contrarily, digital cameras can capture a limited brightness range due to their sensor limitations. Often, the dynamic range of the scene far exceeds the hardware limit of standard digital camera sensors. In such scenarios, the resulting photos will consist of saturated regions, either too dark or too bright to visually comprehend. An easy to deploy and widely used algorithmic solution to this problem is to merge multiple Low Dynamic Range (LDR) images captured with varying exposures into a single High Dynamic Range (HDR) image. Such a fusion process is simple for static sequences that have no camera or object motion. However, in most practical situations, a certain amount of camera and object motions are inevitable, leading to ghost-like artifacts in the final fused result. The process of fusing the LDR images without such ghosting artifacts is known as HDR deghosting. In this thesis, we make several contributions to the literature on HDR deghosting. First, we present a novel method to utilize auxiliary motion segmentation for efficient HDR deghosting. By segmenting the input LDR images into static and moving regions, we propose to learn effective fusion rules for various challenging saturation and motion types. Additionally, we introduce a novel memory network that accumulates the necessary features required to generate plausible details that were lost in the saturated regions. We also present a large-scale motion segmentation dataset of 3683 varying exposure images to benefit the research community. The lack of large and diverse data on exposure brackets is a critical problem for the learning-based HDR deghosting methods. Our next work's main contribution is to generate dynamic bracketed exposure images with ground truth HDR from static sequences. We achieve this data augmentation by synthetically introducing motions through affine transformations. Through experiments, we show that the proposed method generalizes well onto other datasets with real-world motions. Next, we explore data-efficient image fusion techniques for HDR imaging. Convolutional Neural Networks (CNN) have shown tremendous success in many image reconstruction problems. However, CNN-based Multi-Exposure Fusion (MEF) and HDR imaging methods require collecting large datasets with ground truth, which is a tedious and time-consuming process. To address this issue, we propose novel zero and few-shot HDR image fusion methods. First, we introduce an unsupervised deep learning framework for static MEF utilizing a no-reference quality metric as the loss function. In our approach, we modify the Structural Similarity Index Metric (SSIM) to generate expected ground truth statistics and compare them with the predicted output. Second, we propose an approach for training a deep neural network for HDR image deghosting with few labeled and many unlabeled images. The training is done in two stages. In the first stage, the network is trained on a set of dynamic and static images with corresponding ground truth. In the second stage, the network is trained on artificial dynamic sequences and corresponding ground truth generated from stage one. The proposed approach performs comparably to existing methods with only five labeled images. Despite their impressive performance, existing CNN-based HDR deghosting methods are rigid in terms of the number of images to fuse. They are not scalable to fuse arbitrary length LDR sequences during validation. We address this issue by proposing two scalable HDR deghosting algorithms. First, we propose a modular fusion technique that uses mean-max feature aggregation to fuse an arbitrary number of LDR images. Second, we propose a recurrent neural network using a novel Self-Gated Memory (SGM) cell for scalable HDR deghosting. In the SGM cell, the information flow is controlled by multiplying the gate's output by a function of itself. Additionally, we use two SGM cells in a bidirectional setting to improve the output quality. The promising experimental results demonstrate the effectiveness of the proposed recurrent model in HDR deghosting. There are many successful deep learning-based approaches for HDR deghosting, but their computational cost is high, refraining us from generating high spatial resolution images. We address this problem by performing motion compensation in low resolution and use Bilateral Guided Upsampling (BGU) to generate a sharp high-resolution HDR image. The guide image for BGU is synthesized in the weight maps domain with bicubic upsampling. The proposed method outperforms existing methods in terms of computational efficiency while still being accurate. Our proposed method is fast and can fuse a sequence of three 16 megapixels high-resolution images in about 10 seconds.