File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Minimizing Task Initialization Overhead of Hadoop via HDFS Block Coalescing

Author(s)
Kim, Wonbae
Advisor
Nam, Beomseok
Issued Date
2017-02
URI
https://scholarworks.unist.ac.kr/handle/201301/72131 http://unist.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002333292
Abstract
In this work, we present a novel HDFS block coalescing scheme that mitigates the YARN container overhead. YARN is designed to be a generic resource manager that decouples programming models from the resource management infrastructure. We show that YARN’s generic design incurs significant overhead as each container must perform various initialization steps including the authentication.
In order to reduce the container overhead without making significant changes to the existing YARN framework, we propose to leverage the input split, which is the logical representation of physical HDFS blocks. The HDFS block coalescing scheme creates large input splits to enable a single map wave and to reduce the number of containers and their initialization overhead. Our experimental study shows the block coalescing scheme significantly reduces the container overhead while it achieves good load balancing and job scheduling fairness without impairing the degree of overlap between map phase and reduce phase.
Publisher
Ulsan National Institute of Science and Technology (UNIST)
Degree
Master
Major
Department of Computer Engineering

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.