JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, v.127, pp.1 - 17
Abstract
Hardware transactional memory (HTM) is widely supported by commodity processors. While the effectiveness of HTM has been evaluated based on small-scale multi-core systems, it still remains unexplored to quantify the performance and energy efficiency of HTM for scientific workloads on large-scale NUMA systems, which have been increasingly adopted to high-performance computing. To bridge this gap, this work investigates the performance and energy-efficiency impact of HTM on scientific applications on large-scale NUMA systems. Specifically, we quantify the performance and energy efficiency of HTM for scientific workloads based on the widely-used CLOMP-TM benchmark. We then discuss a set of generic software optimizations, which effectively improve the performance and energy efficiency of transactional scientific workloads on large-scale NUMA systems. Further, we present case studies in which we apply a set of the performance and energy-efficiency optimizations to representative transactional scientific applications and investigate the potential for high-performance and energy-efficient runtime support.