IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu

Journal Search


Title	Zero-shot Voice Cloning based Emotion-preserving Video Dubbing
Authors	서준혁(Junhyuk Seo) ; 김태민(Taemin Kim) ; 고혜정(Hyejung Ko) ; 오희석(Heeseok Oh)
DOI	https://doi.org/10.5573/ieie.2026.63.3.53
Page	pp.53-62
ISSN	2287-5026
Keywords	Zero-shot voice cloning; Dubbing; Video translation; Emotion preservation
Abstract	This paper presents an intelligent voice translation system designed to translate foreign-language audiovisual content into another language while preserving the original speaker’s voice identity and emotional expressivity. To achieve this, we employ a deep learning-based zero-shot voice cloning framework that overcomes the limitations inherent in conventional dubbing and subtitle approaches. The proposed system constitutes a multimodal processing pipeline integrating Demucs-based source separation, Whisper and Pyannote for speech segmentation and transcription, a CLAP-based Emotion Conditioning Module, and a ZONOS TTS model for zero-shot emotional speech synthesis. A major contribution of this work lies in the Emotion Conditioning Module, which leverages a CLAP text?audio encoder to embed the emotional state of the source speech into an 8-dimensional latent vector. This embedding serves as a conditioning signal for the ZONOS TTS, enabling emotion-preserving synthesis. In a user study conducted with 51 participants using clips from "The Simpsons," 82% of respondents rated the emotion-conditioned outputs as exhibiting "more natural and clearly expressed emotions." Furthermore, comparative evaluations against leading commercial services demonstrated a distinct advantage, achieving a 62.2% user preference rate for emotion expressivity. These findings empirically demonstrate that the proposed system effectively preserves the vocal individuality and emotional nuances of the original speaker across languages. A demonstration video is available at [https://m.site.naver.com/1T3Nv].

Copyright © IEIE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.