File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

공태식

Gong, Taesik
Ubiquitous AI Lab
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

LanSER: Language-Model Supported Speech Emotion Recognition

Author(s)
Gong, TaesikBelanich, J.Somandepalli, K.Nagrani, A.Eoff, B.Jou, B.
Issued Date
2023-08-20
DOI
10.21437/Interspeech.2023-1832
URI
https://scholarworks.unist.ac.kr/handle/201301/84404
Citation
Conference of the International Speech Communication Association, pp.2408 - 2412
Abstract
Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, we use a textual entailment approach that selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. Our experimental results show that models pre-trained on large datasets with this weak supervision outperform other baseline models on standard SER datasets when fine-tuned, and show improved label efficiency. Despite being pre-trained on labels derived only from text, we show that the resulting representations appear to model the prosodic content of speech. © 2023 International Speech Communication Association. All rights reserved.
Publisher
International Speech Communication Association
ISSN
2308-457X

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.