In this paper, we investigate techniques to effectively manage HDFS in-memory caching for Hadoop. We first revisit the current implementation of Hadoop with HDFS in-memory caching to understand its limitation on the effective usage of in-memory caching. For various representative MapReduce applications, we also evaluate a degree of benefit each application can get from in-memory caching, i.e. cache affinity. We then propose an adaptive cache local scheduling algorithm that adaptively computes how long a MapReduce job waits to be scheduled on a cache local node to be proportional to the percentage of cached input data for the job. In addition, we propose a block goodness aware cache replacement algorithm that determines which block is cached and evicted based on the accessed rate and the cache affinity of applications. Using various workloads consisting of multiple MapReduce applications, we conduct extensive experimental study to demonstrate the effects of the proposed in-memory orchestration techniques. Our experimental results show that our enhanced Hadoop in-memory caching scheme improves the performance of the MapReduce workloads.
Publisher
Ulsan National Institute of Science and Technology (UNIST)