In this paper, we study the effects of HDFS in-memory caching on various MapReduce applications. We first evaluate the performance of seven MapReduce applications to understand different resource usage patterns. We then modify the centralized cache management system in HDFS such that individual blocks of a file can be cached. Using the modified system in HDFS, we compare the performance of MapReduce applications with in-memory caching to that without in-memory caching for workloads of a single MapReduce application and multiple MapReduce applications. In the experiments, the same workload was executed multiple times to see the effects of in-memory caching. Our experimental results show that the in-memory cache system can be beneficial to workloads of multiple I/O-intensive MapReduce applications, but the in-memory cache system cannot improve the performance of non-I/O- intensive MapReduce applications, possibly degrading the performance due to the overhead of in-memory caching.
Publisher
Ulsan National Institute of Science and Technology (UNIST)