IEEE International Parallel and Distributed Processing Symposium, pp.804 - 813
Abstract
Hardware transactional memory (HTM) is supported by widely-used commodity processors. While the effectiveness of HTM has been evaluated based on small-scale multi-core systems, it still remains unexplored to quantify the performance and energy-efficiency of HTM for scientific workloads on large-scale NUMA systems, which have been increasingly adopted to high-performance computing. To bridge this gap, this work investigates the performance and energy-efficiency impact of HTM on scientific applications on large-scale NUMA systems. We first quantify the performance and energy efficiency of HTM for scientific workloads based on the widely-used CLOMP-TM benchmark. We then discuss a set of generic software optimizations that can be effectively used to improve the performance and energy efficiency of transactional scientific workloads on large-scale NUMA systems. Finally, we present case studies in which we apply a set of the optimizations to representative transactional scientific applications and significantly optimize their performance and energy efficiency on large-scale NUMA systems.