TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services

Jeon, Myeongjae; He, Yuxiong; Kim, Hwangju; Elnikety, Sameh; Rixner, Scott; Cox, Alan L.

doi:10.1145/2872362.2872370

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

전명재

Jeon, Myeongjae: OMNIA

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.conferencePlace	US	-
dc.citation.conferencePlace	Georgia TechAtlanta	-
dc.citation.endPage	141	-
dc.citation.startPage	129	-
dc.citation.title	21st International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2016	-
dc.contributor.author	Jeon, Myeongjae	-
dc.contributor.author	He, Yuxiong	-
dc.contributor.author	Kim, Hwangju	-
dc.contributor.author	Elnikety, Sameh	-
dc.contributor.author	Rixner, Scott	-
dc.contributor.author	Cox, Alan L.	-
dc.date.accessioned	2023-12-19T21:07:30Z	-
dc.date.available	2023-12-19T21:07:30Z	-
dc.date.created	2018-08-27	-
dc.date.issued	2016-04-02	-
dc.description.abstract	In interactive services such as web search, recommendations, games and finance, reducing the tail latency is crucial to provide fast response to every user. Using web search as a driving example, we systematically characterize interactive workload to identify the opportunities and challenges for reducing tail latency. We find that the workload consists of mainly short requests that do not benefit from parallelism, and a few long requests which significantly impact the tail but exhibit high parallelism speedup. This motivates estimating request execution time, using a predictor, to identify long requests and to parallelize them. Prediction, however, is not perfect; a long request mispredicted as short is likely to contribute to the server tail latency, setting a ceiling on the achievable tail latency. We propose TPC, an approach that combines prediction information judiciously with dynamic correction for inaccurate prediction. Dynamic correction increases parallelism to accelerate a long request that is mispredicted as short. TPC carefully selects the appropriate target latencies based on system load and parallelism efficiency to reduce tail latency. We implement TPC and several prior approaches to compare them experimentally on a single search server and on a cluster of 40 search servers. The experimental results show that TPC reduces the 99th- and 99.9th-percentile latency by up to 40% compared with the best prior work. Moreover, we evaluate TPC on a finance server, demonstrating its effectiveness on reducing tail latency of interactive services beyond web search.	-
dc.identifier.bibliographicCitation	21st International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2016, pp.129 - 141	-
dc.identifier.doi	10.1145/2872362.2872370	-
dc.identifier.scopusid	2-s2.0-84975230468	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/35431	-
dc.identifier.url	https://dl.acm.org/citation.cfm?doid=2872362.2872370	-
dc.language	영어	-
dc.publisher	21st International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2016	-
dc.title	TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services	-
dc.type	Conference Paper	-
dc.date.conferenceDate	2016-04-02	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.