Deep Residual Transform for Multi-scale Image Decomposition

Yuhao Chen; Alexander Wong; Yuan Fang; Yifan Wu; Linlin Xu

doi:10.15353/jcvis.v6i1.3537

Vol. 6 No. 1 (2020)
Special Issue: Proceedings of CVIS 2020

Articles

Deep Residual Transform for Multi-scale Image Decomposition

https://doi.org/10.15353/jcvis.v6i1.3537

Published 2021-01-15

Yuhao Chen
Alexander Wong
Yuan Fang
Yifan Wu
Linlin Xu

Yuhao Chen
University of Waterloo

Alexander Wong
University of Waterloo

Yuan Fang
University of Waterloo

Yifan Wu
University of Waterloo

Linlin Xu
University of Waterloo

How to Cite

Chen, Y., Wong, A., Fang, Y., Wu, Y., & Xu, L. (2021). Deep Residual Transform for Multi-scale Image Decomposition. Journal of Computational Vision and Imaging Systems, 6(1), 1–5. https://doi.org/10.15353/jcvis.v6i1.3537

Download Citation

Abstract

Multi-scale image decomposition (MID) is a fundamental task in computer vision and image processing that involves the transformation of an image into a hierarchical representation comprising of different levels of visual granularity from coarse structures to fine details. A well-engineered MID disentangles the image signal into meaningful components which can be used in a variety of applications such as image denoising, image compression, and object classification. Traditional MID approaches such as wavelet transforms tackle the problem through carefully designed basis functions under rigid decomposition structure assumptions. However, as the information distribution varies from one type of image content to another, rigid decomposition assumptions lead to inefficiently representation, i.e., some scales can contain little to no information. To address this issue, we present Deep Residual Transform (DRT), a data-driven MID strategy where the input signal is transformed into a hierarchy of non-linear representations at different scales, with each representation being independently learned as the representational residual of previous scales at a user-controlled detail level. As such, the proposed DRT progressively disentangles scale information from the original signal by sequentially learning residual representations. The decomposition flexibility of this approach allows for highly tailored representations cater to specific types of image content, and results in greater representational efficiency and compactness. In this study, we realize the proposed transform by leveraging a hierarchy of sequentially trained autoencoders. To explore the efficacy of the proposed DRT, we leverage two datasets comprising of very different types of image content: 1) CelebFaces and 2) Cityscapes. Experimental results show that the proposed DRT achieved highly efficient information decomposition on both datasets amid their very different visual granularity characteristics.

PDF