Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Erol, Mehmet Hamza; Senocak, Arda; Feng, Jiu; Chung, Joon Son

doi:10.1109/LSP.2024.3483009

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

SENOCAKARDA

Senocak, Arda

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.endPage	2979	-
dc.citation.startPage	2975	-
dc.citation.title	IEEE SIGNAL PROCESSING LETTERS	-
dc.citation.volume	31	-
dc.contributor.author	Erol, Mehmet Hamza	-
dc.contributor.author	Senocak, Arda	-
dc.contributor.author	Feng, Jiu	-
dc.contributor.author	Chung, Joon Son	-
dc.date.accessioned	2025-09-03T14:00:01Z	-
dc.date.available	2025-09-03T14:00:01Z	-
dc.date.created	2025-09-03	-
dc.date.issued	2024-10	-
dc.description.abstract	Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision tasks in this regard. In this study, we explore whether reliance on self-attention is necessary for audio classification tasks. By introducing Audio Mamba (AuM), the first self-attention-free, purely SSM-based model for audio classification, we aim to address this question. We evaluate AuM on various audio datasets - comprising six different benchmarks - where it achieves comparable or better performance compared to well-established AST model.	-
dc.identifier.bibliographicCitation	IEEE SIGNAL PROCESSING LETTERS, v.31, pp.2975 - 2979	-
dc.identifier.doi	10.1109/LSP.2024.3483009	-
dc.identifier.issn	1070-9908	-
dc.identifier.scopusid	2-s2.0-85207707217	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/87863	-
dc.identifier.wosid	001346118800003	-
dc.language	영어	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Audio Mamba: Bidirectional State Space Model for Audio Representation Learning	-
dc.type	Article	-
dc.description.isOpenAccess	FALSE	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalResearchArea	Engineering	-
dc.type.docType	Article	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	Transformers	-
dc.subject.keywordAuthor	Spectrogram	-
dc.subject.keywordAuthor	Computational modeling	-
dc.subject.keywordAuthor	Training	-
dc.subject.keywordAuthor	Context modeling	-
dc.subject.keywordAuthor	Adaptation models	-
dc.subject.keywordAuthor	Standards	-
dc.subject.keywordAuthor	Graphics processing units	-
dc.subject.keywordAuthor	Feature extraction	-
dc.subject.keywordAuthor	Complexity theory	-
dc.subject.keywordAuthor	Audio classification	-
dc.subject.keywordAuthor	audio spectrogram transformers	-
dc.subject.keywordAuthor	state space models	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.