IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu


Title	Wav2vec2.0 Hidden Representations for Voice Conversion
Authors	임재민(Jaemin Lim) ; 김기연(Kiyeon Kim) ; 조성현(Sunghyun Cho) ; 이석복(Suk-Bok Lee)
DOI	https://doi.org/10.5573/ieie.2024.61.11.141
Page	pp.141-149
ISSN	2287-5026
Keywords	Voice conversion; Disentanglement; Self-supervised learning; Wav2vec2.0
Abstract	The current practice of machine-learning based voice conversion is to use the final output representations (of speech learning models) for voice conversion purposes. In this work, we report that aggregate of hidden-layer representations is more useful for voice conversion---rather than the conventional approach of solely using the last-layer representations. We demonstrate this by applying our method to wav2vec2.0 model, which shows performance enhancement in comparison with SOTA VC models in terms of both voice similarity and speech intelligibility.

IEIEJournal of
the Institute of Electronics and Information Engineers