IEEE International Conference on Image Processing, pp.1238 - 1242
Abstract
Discrete wavelet transform (DWT) has been widely used in many image compression applications, such as JPEG2000 and compressive sensing MRI. Even though a lifting scheme [1] has been widely adopted to accelerate DWT, only a handful of research has been done on its efficient implementation on many-core accelerators, such as graphics processing units (GPUs). Moreover, we observe that rearranging the spatial locations of wavelet coefficients at every level of DWT significantly impairs the performance of memory transaction on the GPU. To address these problems, we propose a mixed-band lifting wavelet transform that reduces uncoalesced global memory access on the GPU and maximizes on-chip memory bandwidth by implementing in-place operations using registers. We assess the performance of the proposed method by comparing with the state-of-the-art DWT libraries, and show its usability in a compressive sensing (CS) MRI application.