Predictive parallelization: taming tail latencies in web search

Jeon, Myeongjae; Kim, Saehoon; Hwang, Seung-won; He, Yuxiong; Elmikety, Sameh; Cox, Alan L.; Rixner, Scott

doi:10.1145/2600428.2609572

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

전명재

Jeon, Myeongjae: OMNIA

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Predictive parallelization: taming tail latencies in web search

Author(s): Jeon, Myeongjae, Kim, Saehoon, Hwang, Seung-won, He, Yuxiong, Elmikety, Sameh, Cox, Alan L., Rixner, Scott

Issued Date: 2014-07-06

DOI: 10.1145/2600428.2609572

URI: https://scholarworks.unist.ac.kr/handle/201301/35587

Fulltext: https://dl.acm.org/citation.cfm?id=2609572

Citation: 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp.253 - 262

Abstract: Web search engines are optimized to reduce the high-percentile response time to consistently provide fast responses to almost all user queries. This is a challenging task because the query workload exhibits large variability, consisting of many short-running queries and a few long-running queries that significantly impact the high-percentile response time. With modern multicore servers, parallelizing the processing of an individual query is a promising solution to reduce query execution time, but it gives limited benefits compared to sequential execution since most queries see little or no speedup when parallelized. The root of this problem is that short-running queries, which dominate the workload, do not benefit from parallelization. They incur a large parallelization overhead, taking scarce resources from long-running queries. On the other hand, parallelization substantially reduces the execution time of long-running queries with low overhead and high parallelization efficiency. Motivated by these observations, we propose a predictive parallelization framework with two parts: (1) predicting long-running queries, and (2) selectively parallelizing them. For the first part, prediction should be accurate and efficient. For accuracy, we study a comprehensive feature set covering both term features (reflecting dynamic pruning efficiency) and query features (reflecting query complexity). For efficiency, to keep overhead low, we avoid expensive features that have excessive requirements such as large memory footprints. For the second part, we use the predicted query execution time to parallelize long-running queries and process short-running queries sequentially. We implement and evaluate the predictive parallelization framework in Microsoft Bing search. Our measurements show that under moderate to heavy load, the predictive strategy reduces the 99th-percentile response time by 50% (from 200 ms to 100 ms) compared with prior approaches that parallelize all queries.

Publisher: 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.