BROWSE

Related Researcher

Author

Lee, Jongeun
Renew: Reconfigurable and Neuromorphic Computing Lab
Research Interests
  • Reconfigurable processor architecture, neuromorphic processor, stochastic computing

ITEM VIEW & DOWNLOAD

High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures

Cited 0 times inthomson ciCited 5 times inthomson ci
Title
High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures
Author
Kim, YongjooLee, JongeunShrivastava, AviralYoon, Jonghee W.Cho, DoosanPaek, Yunheung
Keywords
Array mapping; bank conflict; Coarse grained reconfigurable architecture; compilation; multi-bank memory
Issue Date
201111
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Citation
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, v.30, no.11, pp.1599 - 1609
Abstract
Coarse-grained reconfigurable arrays (CGRAs) are a very promising platform, providing both up to 10-100 MOps/mW of power efficiency and software programmability. However, this promise of CGRAs critically hinges on the effectiveness of application mapping onto CGRA platforms. While previous solutions have greatly improved the computation speed, they have largely ignored the impact of the local memory architecture on the achievable power and performance. This paper motivates the need for memory-aware application mapping for CGRAs, and proposes an effective solution for application mapping that considers the effects of various memory architecture parameters including the number of banks, local memory size, and the communication bandwidth between the local memory and the external main memory. Further we propose efficient methods to handle dependent data on a double-buffering local memory, which is necessary for recurrent loops. Our proposed solution achieves 59% reduction in the energy-delay product, which factors into about 47% and 22% reduction in the energy consumption and runtime, respectively, as compared to memory-unaware mapping for realistic local memory architectures. We also show that our scheme scales across a range of applications and memory parameters, and the runtime overhead of handling recurrent loops by our proposed methods can be less than 1%.
URI
Go to Link
DOI
http://dx.doi.org/10.1109/TCAD.2011.2161217
ISSN
0278-0070
Appears in Collections:
ECE_Journal Papers

find_unist can give you direct access to the published full text of this article. (UNISTARs only)

Show full item record

qr_code

  • mendeley

    citeulike

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

MENU