File Download

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Locality-Aware Fair Scheduling in the Distributed Query Processing Framework

Author(s)
Eom, Youngmoon
Advisor
Nam, Beomseok
Issued Date
2015-02
URI
https://scholarworks.unist.ac.kr/handle/201301/71884 http://unist.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001925183
Abstract
Utilizing caching facilities in modern query processing systems is getting more important as the capacity of main memory is having been greatly increasing. Especially in the data intensive applications, caching effect gives significant performance gain avoiding disk I/O which is highly expensive than memory access. Therefore data must be carefully distributed across back-end application servers to get advantages from caching as much as possible.
On the other hand, load balance across back-end application servers is another concern the scheduler must consider. Serious load imbalance may result in poor performance even if the cache hit ratio is high. And the fact that scheduling decision which raises cache hit ratio sometimes results in load imbalance even makes it harder to make scheduling decision. Therefore we should find a scheduling algorithm which balances trade-off between load balance and cache hit ratio successfully.
To consider both cache hit and load balance, we propose two semantic caching mechanisms DEMB and EM-KDE which successfully balance the load while keeping high cache hit ratio by analyzing and predicting trend of query arrival patterns.
Another concern discussed in this paper is the environment with multiple front-end schedulers. Each scheduler can have different query arrival pattern from users. To reflect those differences of query arrival pattern from each front-end scheduler, we compare 3 algorithms which aggregate the query arrival pattern information from each front-end scheduler and evaluate them.
To increase cache hit ratio in semantic caching scheduling further, migrating contents of cache to nearby server is proposed. We can increase cache hit count if data can be dynamically migrated to the server where the subsequent data requests supposed to be forwarded. Several migrating policies and their pros and cons will be discussed later.
Finally, we introduce a MapReduce framework called Eclipse which takes full advantages from semantic caching scheduling algorithm mentioned above. We show that Eclipse outperforms other MapReduce frameworks in most evaluations.
Publisher
Ulsan National Institute of Science and Technology (UNIST)
Degree
Master
Major
Department of Computer Engineering

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.