LanSER: Language-Model Supported Speech Emotion Recognition

Scholarworks@UNIST

UNIST Library

There are no files associated with this item.

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

공태식

Read More

Cited time in webofscience

Cited time in scopus

Metadata Downloads

LanSER: Language-Model Supported Speech Emotion Recognition

Author(s): Gong, Taesik, Belanich, J., Somandepalli, K., Nagrani, A., Eoff, B., Jou, B.

Citation: Conference of the International Speech Communication Association, pp.2408 - 2412

Abstract: Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, we use a textual entailment approach that selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. Our experimental results show that models pre-trained on large datasets with this weak supervision outperform other baseline models on standard SER datasets when fine-tuned, and show improved label efficiency. Despite being pre-trained on labels derived only from text, we show that the resulting representations appear to model the prosodic content of speech. © 2023 International Speech Communication Association. All rights reserved.

qrcode

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.