TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services

Jeon, Myeongjae; He, Yuxiong; Kim, Hwangju; Elnikety, Sameh; Rixner, Scott; Cox, Alan L.

doi:10.1145/2872362.2872370

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

전명재

Jeon, Myeongjae: OMNIA

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services

Author(s): Jeon, Myeongjae, He, Yuxiong, Kim, Hwangju, Elnikety, Sameh, Rixner, Scott, Cox, Alan L.

Issued Date: 2016-04-02

DOI: 10.1145/2872362.2872370

URI: https://scholarworks.unist.ac.kr/handle/201301/35431

Fulltext: https://dl.acm.org/citation.cfm?doid=2872362.2872370

Citation: 21st International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2016, pp.129 - 141

Abstract: In interactive services such as web search, recommendations, games and finance, reducing the tail latency is crucial to provide fast response to every user. Using web search as a driving example, we systematically characterize interactive workload to identify the opportunities and challenges for reducing tail latency. We find that the workload consists of mainly short requests that do not benefit from parallelism, and a few long requests which significantly impact the tail but exhibit high parallelism speedup. This motivates estimating request execution time, using a predictor, to identify long requests and to parallelize them. Prediction, however, is not perfect; a long request mispredicted as short is likely to contribute to the server tail latency, setting a ceiling on the achievable tail latency. We propose TPC, an approach that combines prediction information judiciously with dynamic correction for inaccurate prediction. Dynamic correction increases parallelism to accelerate a long request that is mispredicted as short. TPC carefully selects the appropriate target latencies based on system load and parallelism efficiency to reduce tail latency.

We implement TPC and several prior approaches to compare them experimentally on a single search server and on a cluster of 40 search servers. The experimental results show that TPC reduces the 99th- and 99.9th-percentile latency by up to 40% compared with the best prior work. Moreover, we evaluate TPC on a finance server, demonstrating its effectiveness on reducing tail latency of interactive services beyond web search.

Publisher: 21st International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2016

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.