There are no files associated with this item.
Cited time in
Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.citation.conferencePlace | IE | - |
| dc.citation.endPage | 2412 | - |
| dc.citation.startPage | 2408 | - |
| dc.citation.title | Conference of the International Speech Communication Association | - |
| dc.contributor.author | Gong, Taesik | - |
| dc.contributor.author | Belanich, J. | - |
| dc.contributor.author | Somandepalli, K. | - |
| dc.contributor.author | Nagrani, A. | - |
| dc.contributor.author | Eoff, B. | - |
| dc.contributor.author | Jou, B. | - |
| dc.date.accessioned | 2024-11-08T17:05:06Z | - |
| dc.date.available | 2024-11-08T17:05:06Z | - |
| dc.date.created | 2024-11-08 | - |
| dc.date.issued | 2023-08-20 | - |
| dc.description.abstract | Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, we use a textual entailment approach that selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. Our experimental results show that models pre-trained on large datasets with this weak supervision outperform other baseline models on standard SER datasets when fine-tuned, and show improved label efficiency. Despite being pre-trained on labels derived only from text, we show that the resulting representations appear to model the prosodic content of speech. © 2023 International Speech Communication Association. All rights reserved. | - |
| dc.identifier.bibliographicCitation | Conference of the International Speech Communication Association, pp.2408 - 2412 | - |
| dc.identifier.doi | 10.21437/Interspeech.2023-1832 | - |
| dc.identifier.issn | 2308-457X | - |
| dc.identifier.scopusid | 2-s2.0-85171526188 | - |
| dc.identifier.uri | https://scholarworks.unist.ac.kr/handle/201301/84404 | - |
| dc.language | 영어 | - |
| dc.publisher | International Speech Communication Association | - |
| dc.title | LanSER: Language-Model Supported Speech Emotion Recognition | - |
| dc.type | Conference Paper | - |
| dc.date.conferenceDate | 2023-08-20 | - |
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr
Copyright (c) 2023 by UNIST LIBRARY. All rights reserved.
ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.