LanSER: Language-Model Supported Speech Emotion Recognition

Gong, Taesik; Belanich, J.; Somandepalli, K.; Nagrani, A.; Eoff, B.; Jou, B.

doi:10.21437/Interspeech.2023-1832

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

공태식

Gong, Taesik: Ubiquitous AI Lab

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.conferencePlace	IE	-
dc.citation.endPage	2412	-
dc.citation.startPage	2408	-
dc.citation.title	Conference of the International Speech Communication Association	-
dc.contributor.author	Gong, Taesik	-
dc.contributor.author	Belanich, J.	-
dc.contributor.author	Somandepalli, K.	-
dc.contributor.author	Nagrani, A.	-
dc.contributor.author	Eoff, B.	-
dc.contributor.author	Jou, B.	-
dc.date.accessioned	2024-11-08T17:05:06Z	-
dc.date.available	2024-11-08T17:05:06Z	-
dc.date.created	2024-11-08	-
dc.date.issued	2023-08-20	-
dc.description.abstract	Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, we use a textual entailment approach that selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. Our experimental results show that models pre-trained on large datasets with this weak supervision outperform other baseline models on standard SER datasets when fine-tuned, and show improved label efficiency. Despite being pre-trained on labels derived only from text, we show that the resulting representations appear to model the prosodic content of speech. © 2023 International Speech Communication Association. All rights reserved.	-
dc.identifier.bibliographicCitation	Conference of the International Speech Communication Association, pp.2408 - 2412	-
dc.identifier.doi	10.21437/Interspeech.2023-1832	-
dc.identifier.issn	2308-457X	-
dc.identifier.scopusid	2-s2.0-85171526188	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/84404	-
dc.language	영어	-
dc.publisher	International Speech Communication Association	-
dc.title	LanSER: Language-Model Supported Speech Emotion Recognition	-
dc.type	Conference Paper	-
dc.date.conferenceDate	2023-08-20	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.