Mobile QR Code
Title Wav2vec2.0 Hidden Representations for Voice Conversion
Authors 임재민(Jaemin Lim) ; 김기연(Kiyeon Kim) ; 조성현(Sunghyun Cho) ; 이석복(Suk-Bok Lee)
DOI https://doi.org/10.5573/ieie.2024.61.11.141
Page pp.141-149
ISSN 2287-5026
Keywords Voice conversion; Disentanglement; Self-supervised learning; Wav2vec2.0
Abstract The current practice of machine-learning based voice conversion is to use the final output representations (of speech learning models) for voice conversion purposes. In this work, we report that aggregate of hidden-layer representations is more useful for voice conversion---rather than the conventional approach of solely using the last-layer representations. We demonstrate this by applying our method to wav2vec2.0 model, which shows performance enhancement in comparison with SOTA VC models in terms of both voice similarity and speech intelligibility.