File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

전명재

Jeon, Myeongjae
OMNIA
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.conferencePlace US -
dc.citation.conferencePlace Virtual -
dc.citation.title USENIX Annual Technical Conference -
dc.contributor.author Gangmuk Lim -
dc.contributor.author Jeongseob Ahn -
dc.contributor.author Wencong Xiao -
dc.contributor.author Youngjin Kwon -
dc.contributor.author Jeon, Myeongjae -
dc.date.accessioned 2024-01-31T21:38:25Z -
dc.date.available 2024-01-31T21:38:25Z -
dc.date.created 2021-11-29 -
dc.date.issued 2021-07-15 -
dc.description.abstract GPUs are the workhorse in modern server infrastructure fueling advances in a number of compute-intensive workloads such as deep neural network (DNN) training. Several recent works propose solutions on sharing GPU resources across multiple concurrent DNN training jobs, but none of them address rapidly increasing memory footprint introduced by such job co-locations, which greatly limit the effectiveness of sharing GPU resources. In this paper, we present Zico, the first DNN system that aims at reducing the system-wide memory consumption for concurrent training. Zico keeps track of the memory usage pattern of individual training job by monitoring its progress on GPU computations and makes memory reclaimed from the job globally sharable. Based on this memory management scheme, Zico automatically decides a strategy to share memory among concurrent jobs with minimum delay on training while not exceeding a given memory budget such as GPU memory capacity. Our evaluation shows that Zico outperforms existing GPU sharing approaches and delivers benefits over a variety of job co-location scenarios. -
dc.identifier.bibliographicCitation USENIX Annual Technical Conference -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/77162 -
dc.language 영어 -
dc.publisher USENIX -
dc.title Zico: Efficient GPU Memory Sharing for Concurrent DNN Training -
dc.type Conference Paper -
dc.date.conferenceDate 2021-07-14 -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.