File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

공태식

Gong, Taesik
Ubiquitous AI Lab
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.conferencePlace IE -
dc.citation.endPage 2412 -
dc.citation.startPage 2408 -
dc.citation.title Conference of the International Speech Communication Association -
dc.contributor.author Gong, Taesik -
dc.contributor.author Belanich, J. -
dc.contributor.author Somandepalli, K. -
dc.contributor.author Nagrani, A. -
dc.contributor.author Eoff, B. -
dc.contributor.author Jou, B. -
dc.date.accessioned 2024-11-08T17:05:06Z -
dc.date.available 2024-11-08T17:05:06Z -
dc.date.created 2024-11-08 -
dc.date.issued 2023-08-20 -
dc.description.abstract Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, we use a textual entailment approach that selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. Our experimental results show that models pre-trained on large datasets with this weak supervision outperform other baseline models on standard SER datasets when fine-tuned, and show improved label efficiency. Despite being pre-trained on labels derived only from text, we show that the resulting representations appear to model the prosodic content of speech. © 2023 International Speech Communication Association. All rights reserved. -
dc.identifier.bibliographicCitation Conference of the International Speech Communication Association, pp.2408 - 2412 -
dc.identifier.doi 10.21437/Interspeech.2023-1832 -
dc.identifier.issn 2308-457X -
dc.identifier.scopusid 2-s2.0-85171526188 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/84404 -
dc.language 영어 -
dc.publisher International Speech Communication Association -
dc.title LanSER: Language-Model Supported Speech Emotion Recognition -
dc.type Conference Paper -
dc.date.conferenceDate 2023-08-20 -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.