Title |
Automatic Chinese-English Translation Algorithm based on Out-of-vocabulary Words in the Context of Cross-cultural Communication |
Authors |
(Jiayan Duan) ; (Hongwei Ma) ; (Junxia Wang) |
DOI |
https://doi.org/10.5573/IEIESPC.2023.12.6.466 |
Keywords |
Cross-cultural communication; Out-of-vocabulary; Automatic translation; Neural network |
Abstract |
In the context of cross-cultural communication, translation between languages has become increasingly important. Based on automatic Chinese?English translation, this study examined the processing of out-of-vocabulary (OOV) words. First, this paper briefly introduces two basic translation models: seq2seq and Transformer. Second, we propose a semantic-based OOV processing method, which replaces OOV words with the most similar words by calculating the semantic similarity of word vectors and then uses the source-language sentences with the replaced words to train a translation model. Compared to the seq2seq model, the Bilingual Evaluation Understudy (BLEU) values of the Transformer model were higher (37.26 for the NIST06 dataset and 30.75 for the NIST08 dataset). After OOV processing, retaining low-frequency OOV words was conducive to the improvement of BLEU scores, which were increased by 0.63 and 0.09 for NIST06 and NIST08 for the Transformer model, respectively. This shows the effectiveness of the OOV processing method. The OOV processing method could be applied to automatic Chinese?English translation. |