File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

SENOCAKARDA

Senocak, Arda
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Author(s)
Erol, Mehmet HamzaSenocak, ArdaFeng, JiuChung, Joon Son
Issued Date
2024-10
DOI
10.1109/LSP.2024.3483009
URI
https://scholarworks.unist.ac.kr/handle/201301/87863
Citation
IEEE SIGNAL PROCESSING LETTERS, v.31, pp.2975 - 2979
Abstract
Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision tasks in this regard. In this study, we explore whether reliance on self-attention is necessary for audio classification tasks. By introducing Audio Mamba (AuM), the first self-attention-free, purely SSM-based model for audio classification, we aim to address this question. We evaluate AuM on various audio datasets - comprising six different benchmarks - where it achieves comparable or better performance compared to well-established AST model.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
ISSN
1070-9908
Keyword (Author)
TransformersSpectrogramComputational modelingTrainingContext modelingAdaptation modelsStandardsGraphics processing unitsFeature extractionComplexity theoryAudio classificationaudio spectrogram transformersstate space models

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.