File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

이종은

Lee, Jongeun
Intelligent Computing and Codesign Lab.
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.endPage 897 -
dc.citation.number 5 -
dc.citation.startPage 888 -
dc.citation.title IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS -
dc.citation.volume 38 -
dc.contributor.author Lee, Sugil -
dc.contributor.author Kim, Daewoo -
dc.contributor.author Nguyen, Dong -
dc.contributor.author Lee, Jongeun -
dc.date.accessioned 2023-12-21T19:12:07Z -
dc.date.available 2023-12-21T19:12:07Z -
dc.date.created 2018-06-12 -
dc.date.issued 2019-05 -
dc.description.abstract Deep learning such as Convolutional Neural Networks (CNNs) are an important workload increasingly demanding high-performance hardware acceleration. One distinguishing feature of deep learnng workload is that it is inherently resilient to small numerical errors and works very well with low precision hardware. Thus we propose a novel method, called Double MAC, to theoretically double the computation rate of CNN accelerators by packing two multiply-and-accumulate (MAC) operations into one DSP block of off-the-shelf FPGAs. There are several technical challenges, which we overcome by exploiting the mode of operation in the CNN accelerator. We have validated our method through FPGA synthesis and Verilog simulation, and evaluated our method by applying it to the state-of-the-art CNN accelerator. We find that our Double MAC approach can increase the computation throughput of a CNN layer by twice. On the network level (all convolution layers combined), the performance improvement varies depending on the CNN application and FPGA size, from 14% to more than 80% over a highly optimized state-of-the-art accelerator solution, without sacrificing the output quality significantly. -
dc.identifier.bibliographicCitation IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, v.38, no.5, pp.888 - 897 -
dc.identifier.doi 10.1109/TCAD.2018.2824280 -
dc.identifier.issn 0278-0070 -
dc.identifier.scopusid 2-s2.0-85045193712 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/24220 -
dc.identifier.url https://ieeexplore.ieee.org/document/8332524/ -
dc.identifier.wosid 000466037700009 -
dc.language 영어 -
dc.publisher IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC -
dc.title Double MAC on a DSP: Boosting the Performance of Convolutional Neural Networks on FPGAs -
dc.type Article -
dc.description.isOpenAccess FALSE -
dc.relation.journalWebOfScienceCategory Computer Science, Hardware & Architecture; Computer Science, Interdisciplinary Applications; Engineering, Electrical & Electronic -
dc.relation.journalResearchArea Computer Science; Engineering -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.subject.keywordAuthor Accelerator architectures -
dc.subject.keywordAuthor Convolution -
dc.subject.keywordAuthor Convolutional neural network -
dc.subject.keywordAuthor DSP (Digital Signal Processing) block -
dc.subject.keywordAuthor Field programmable gate arrays -
dc.subject.keywordAuthor FPGA -
dc.subject.keywordAuthor Hardware -
dc.subject.keywordAuthor MAC (Multiply-and-Accumulate). -
dc.subject.keywordAuthor reduced precision -
dc.subject.keywordAuthor SIMD (Single-Instruction Multiple-Data) -
dc.subject.keywordAuthor Table lookup -
dc.subject.keywordAuthor Throughput -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.