File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

김태환

Kim, Taehwan
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.conferencePlace AU -
dc.citation.conferencePlace Graz -
dc.citation.endPage 708 -
dc.citation.startPage 704 -
dc.citation.title Annual Conference of the International Speech Communication Association -
dc.contributor.author Mohammadi, Seyed Hamidreza -
dc.contributor.author Kim, Taehwan -
dc.date.accessioned 2024-01-31T23:40:51Z -
dc.date.available 2024-01-31T23:40:51Z -
dc.date.created 2021-09-01 -
dc.date.issued 2019-09 -
dc.description.abstract We propose voice conversion model from arbitrary source speaker to arbitrary target speaker with disentangled representations. Voice conversion is a task to convert the voice of spoken utterance of source speaker to that of target speaker. Most prior work require to know either source speaker or target speaker or both in training, with either parallel or non-parallel corpus. Instead, we study the problem of voice conversion in nonparallel speech corpora and one-shot learning setting. We convert an arbitrary sentences of an arbitrary source speaker to target speakers given only one or few target speaker training utterances. To achieve this, we propose to use disentangled representations of speaker identity and linguistic context. We use a recurrent neural network (RNN) encoder for speaker embedding and phonetic posteriorgram as linguistic context encoding, along with a RNN decoder to generate converted utterances. Ours is a simpler model without adversarial training or hierarchical model design and thus more efficient. In the subjective tests, our approach achieved significantly better results compared to baseline regarding similarity. -
dc.identifier.bibliographicCitation Annual Conference of the International Speech Communication Association, pp.704 - 708 -
dc.identifier.doi 10.21437/Interspeech.2019-1798 -
dc.identifier.issn 2308-457X -
dc.identifier.scopusid 2-s2.0-85074730037 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/79313 -
dc.language 영어 -
dc.publisher International Speech Communication Association -
dc.title One-shot voice conversion with disentangled representations by leveraging phonetic posteriorgrams -
dc.type Conference Paper -
dc.date.conferenceDate 2019-09-15 -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.