A data colocation grid framework for big data medical image processing: Backend design

Bao, S.; Huo, Y.; Parvathaneni, P.; Plassard, A.J.; Bermudez, C.; Yao, Y.; Lyu, Ilwoo; Gokhale, A.; Landman, B.A.

doi:10.1117/12.2293694

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

류일우

Lyu, Ilwoo: 3D Shape Analysis Lab.

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

A data colocation grid framework for big data medical image processing: Backend design

Author(s): Bao, S., Huo, Y., Parvathaneni, P., Plassard, A.J., Bermudez, C., Yao, Y., Lyu, Ilwoo, Gokhale, A., Landman, B.A.

Issued Date: 2018-02-13

DOI: 10.1117/12.2293694

URI: https://scholarworks.unist.ac.kr/handle/201301/50139

Citation: Medical Imaging 2018: Imaging Informatics for Healthcare, Research, and Applications

Abstract: When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a medical image processing-As-A-service grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework's performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop and HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-Time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available. © 2018 SPIE.

Publisher: SPIE

ISSN: 1605-7422

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.