Title |
Wav2vec2.0 Hidden Representations for Voice Conversion |
Authors |
임재민(Jaemin Lim) ; 김기연(Kiyeon Kim) ; 조성현(Sunghyun Cho) ; 이석복(Suk-Bok Lee) |
DOI |
https://doi.org/10.5573/ieie.2024.61.11.141 |
Keywords |
Voice conversion; Disentanglement; Self-supervised learning; Wav2vec2.0 |
Abstract |
The current practice of machine-learning based voice conversion is to use the final output representations (of speech learning models) for voice conversion purposes. In this work, we report that aggregate of hidden-layer representations is more useful for voice conversion---rather than the conventional approach of solely using the last-layer representations. We demonstrate this by applying our method to wav2vec2.0 model, which shows performance enhancement in comparison with SOTA VC models in terms of both voice similarity and speech intelligibility. |