DuanJiayan1
MaHongwei1
WangJunxia1
-
(School of Humanities and Law, Yanching Institute of Technology, Langfang, Hebei 065201,
China)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Cross-cultural communication, Out-of-vocabulary, Automatic translation, Neural network
1. Introduction
With the development of society, more and more languages have been involved in communication
[1]. In the context of cross-cultural communication, translation has become increasingly
important to facilitate the exchange of information [2]. With the development of technology, machine translation has gradually matured. Machine
translation can automatically translate languages through computer technology, which
not only plays a positive role in people’s lives in terms of tourism, finance, etc.
[3], but also provides more convenience for cross-cultural communication [4].
Compared with human translation, machine translation is faster and less expensive,
so it has made great contributions to the world’s development and communication. Given
the important role of machine translation, finding ways to achieve more efficient
and accurate automatic translation has attracted the attention of researchers [5]. Nagaraj et al. [6] translated the Kannada text into English through neural machine translation (NMT).
Compared to statistical machine translation (SMT), NMT achieved a better Bilingual
Evaluation Understudy (BLEU) score and had an accuracy of 86.32%.
Under the premise of weakening grammar rules, Li et al. [7] proposed a machine translation method based on artificial intelligence and analyzed
English grammar rules. They found that this method had great potential. Based on Hindi-English
translation, Tiwari et al. [8] compared two NMT models, which were realized by Long Short Term Memory (LSTM) and
Conventional Neural Network (CNN) methods combined with an attention mechanism. Their
study provided the best model and parameters for the task. Saengthongpattana et al.
[9] studied a Thai-English translation and compared Transformer, Recurrent Neural Network
(RNN), and SMT models. They found that the Transformer model had the highest BLEU
score, the SMT model had the most word-order errors, and the RNN model had the most
errors and omissions in word selection.
This paper focuses on the automatic translation of out-of-vocabulary (OOV) words in
the context of cross-cultural communication. We propose a new solution to the shortcomings
of current Chinese-English automatic translation methods in OOV processing and show
the reliability of the method for improving translation quality by comparing it with
another Chinese-English translation model. This paper also presents a new idea for
the processing of OOV words in the field of automatic translation, which could be
studied for the automatic translation of more languages to further improve the usability
of automatic translation.
2. Translation Algorithm for Out-of-vocabulary Words
2.1 Seq2seq Model based on the Attention Mechanism
NMT can directly translate a source language into a target language through an RNN
[10]. Compared with SMT, NMT has higher efficiency and quality [11], so it has been widely applied in automatic translation [12]. NMT is composed of an encoder and decoder, and its process is represented by the
following:
where $w^{\left(s\right)}$ and $w^{\left(t\right)}$ are the aligned sentence pair
of the source language and translation, and $c$ is a context vector generated by the
encoder.
The model is trained by maximizing the conditional probability likelihood function
of parallel corpora using the formula $\log p\left(w^{\left(t\right)}|w^{\left(s\right)}\right)=\sum
_{m=1}^{M}p\left(w_{m}^{\left(t\right)}|w_{1\colon m-1}^{\left(t\right)},c\right)$,
where $M$ is the sentence length of the output translation, and $w_{m}^{\left(t\right)}$
is the $m$-th output target word. For the encoder-decoder structure, the seq2seq model
is the simplest one [13] and is usually used to solve problems such as machine translation and speech recognition
[14].
The Google Neural Machine Translation (GNMT) model is based on the seq2seq model [15] and uses two RNNs as the encoder and decoder. A Chinese and English sentence pair
is represented as $\left(X,Y\right)\,,$ where $\mathrm{X}=\left(\mathrm{x}_{1},\mathrm{x}_{2},\cdots
,\mathrm{x}_{\mathrm{M}}\right)$ ($M$ is the length of the source language word sequence)
and $Y$ is $\left(y_{1},y_{2},\cdots ,y_{N}\right)~ $($\mathrm{N}$ is the length of
the translation word sequence). The encoder RNN in the seq2seq model is $C=\left(X_{1},X_{2},\cdots
,X_{M}\right)=EncoderRNN\left(x_{1},x_{2},\cdots ,x_{M}\right)$. The conditional probability
of the sentence pair is written as $P\left(Y|X\right)=P\left(Y|C\right)=\prod _{i=1}^{N}p\left(y_{1}|y_{0},y_{1},\cdots
,y_{i-1};C\right)$, where $y_{0}$ indicates the start of translation, ``<EOS>''.
An attention mechanism [16] is adopted to improve the performance of the seq2seq model in automatic translation.
After improvement by the attention mechanism, the fixed context vector $c$ is no longer
applicable. The conditional probability of output at time $i$ is written as $p\left(y_{i}|y_{0},y_{1},\cdots
,y_{i-1};C\right)=g\left(y_{i-1},s_{i},c_{i}\right)$, where$s_{i}$is the hidden layer
status of the decoder, $s_{i}=f\left(y_{i-1},s_{i-1},c_{i}\right)$, and $c_{i}$ is
the context vector of the encoder at time $i$. Furthermore, a score function is defined:
$e_{ij}=a\left(s_{i-1},h_{j}\right)$, where $h_{j}$ is the output status of the encoder,
and $a$ is an arbitrary function in the real number field, $a_{ij}=exp\left(e_{ij}\right)/\sum
_{k=1}^{M}exp\left(e_{ik}\right)$. $a_{ij}$ at different time points forms a vector
$a_{i}=\left(a_{i1},a_{i2},\cdots ,a_{iM}\right)$, and $a_{i}$ is the attention vector.
The context vector at time $i$ is $c_{i}=\sum _{j=1}^{M}a_{ij}h_{j}$.
2.2 Transformer Model
The Transformer model does not use an RNN and instead uses a self-attention mechanism
to realize fast calculation [17], which significantly improves the translation quality. In essence, it is also a seq2seq
model that can be divided into a coding layer and a decoding layer [18]. The model represents every word with three vectors, Query(Q), Key(K), and Value(V).
A word vector $e_{i}$, $e_{i}\in R^{l\times p}$, is multiplied by three weight matrices
with dimensions of $p\times d$, which are denoted as $W^{Q}$, $W^{K}$, and $W^{V}$.
Next, a multi-head attention mechanism is used to perform matrix splicing on $k$ and
$v$ of every word to obtain $n\times d$ matrices $K$ and $V$ ($n$ is the number of
words). After dividing $q$, $K,$ and $V$, $\left\{q_{i}\right\}_{i=1}^{m}$, $\left\{K_{i}\right\}_{i=1}^{m}$,
and $\left\{V_{i}\right\}_{i=1}^{m}$ are obtained. For any $i\in m$ ($m$ is the number
of heads), the self-attention is calculated as:
Then, the multi-head attention is calculated:
In the Transformer model, a position code is used to explain the word order. The dimension
is the same as the word vector, $d_{model}=512$. The formulas are:
2.3 Out-of-vocabulary Processing
In NMT, words with lower frequency in the corpus that cannot be added to the dictionary
are called OOV words [19] and are usually expressed by <UNK>. The semantics of the original word is lost in
automatic translation, which decreases the quality of translation. Therefore, solving
the problem of OOV words is an important task of NMT. We propose a semantics-based
approach to replace OOV words in the corpus.
First, the skip-gram model [20] from the Word2vec tool is used to learn word vectors, and the structure is shown
in Fig. 1. The principle is to predict the surrounding words based on the current word. Word
vectors were learned for Chinese and English corpora, the window size was set as 5,
and the word vector dimension was 300.
Based on the learned word vectors, the semantic similarity is calculated. Based on
the cosine similarity, the similarity between word vector $w$ and common word vector
$w'$ is:
where $IV$ is a list of common words. After obtaining candidates for similar words,
an n-gram model is used to find the most appropriate replacement word to improve the
fluency of sentences:
where $score_{blm}$ refers to the score of candidate words.
The word with the highest score is found to replace the OOV words. In addition, words
without semantic vectors (low-frequency OOV words) in the corpus are either ① retained
or ② deleted. The automatic translation between Chinese and English after OOV processing
is shown in Fig. 2. Based on word vector training and similarity calculation, the most similar word
to the OOV words is found to replace the OOV words, and then the NMT model is trained
using the replaced corpus and translates the replaced source language.
Fig. 2. Automatic Chinese–English translation algorithm for OOV words.
3. Results and Analysis
Tests were conducted using the Windows 10 operating system with 8 GB of memory and
an NVIDIA GeForce GTX 1070 Ti. The model was built and computed using TensorFlow.
The parameters of the seq2seq model and the Transformer model are shown in Table 1.
The LDC dataset was used in the experiments. The model was trained on LDC2004T07,
LDC2004T08, LDC2005T06, and LDC2005T10. NIST05 was used as the development set, and
NIST06 and NIST08 were used as the test sets. These datasets are described in the
following:
LDC2004T07: Multiple-Translation Chinese (MTC) Part 3
LDC2004T08: Hong Kong Parallel Text
LDC2005T06: Chinese News Translation Text Part 1
LDC2005T10: Chinese English News Magazine Parallel Text
NIST05: NIST 2005 Open Machine Translation (OpenMT) Evaluation
NIST06: NIST 2006 Open Machine Translation (OpenMT) Evaluation
NIST08: NIST 2008 Open Machine Translation (OpenMT) Evaluation
The BLEU score was used as an evaluation index of the algorithm [21]. The higher the score of BLEU is, the closer the translation is to the result of
manual translation. Based on n-grams, the calculation formula of the BLEU score is:
where $BP$ is the penalty factor, $w_{n}$ is the weight factor, $w_{n}=\frac{1}{2^{n}}$
, and $N$ is the n-gram size, which is usually 4. The BLEU value is between 0 and
100, and the higher the similarity is, the larger the value of BLEU is. The results
of the seq2seq model and Transformer model on the test sets are shown in Fig. 3.
Fig. 3 shows that the quality of translation obtained by the Transformer model was higher
than that of the seq2seq model. First of all, on NIST06, the BLEU score of the seq2seq
model was 36.45, while that of the Transformer model was 37.26, which is higher by
0.81. Second, on NIST08, the BLEU score of the seq2seq model was 30.16, while that
of the Transformer model was 30.75, which is greater by 0.59. These results indicated
that the Transformer model performed better than the seq2seq model in automatic Chinese-English
translation.
Fig. 3 shows the result without considering OOV words. The seq2seq model was analyzed, and
the OOV words were replaced using the method proposed in this paper to obtain a replaced
corpus. The seq2seq model was then trained. The performance on different datasets
is shown in Table 2.
Table 2 shows that after OOV processing, if a low-frequency OOV word was retained, the BLEU
score was higher than that of the seq2seq model (37.12 (+0.67) for NIST06 and 30.34
(+0.18) for NIST08). However, when the low-frequency OOV word was deleted, the BLEU
score decreased (36.16 (-0.26) for NIST06 and 30.08 (- 0.08) for NIST08). These results
suggest that directly deleting OOV words might damage the sentence structure and result
in ambiguity. Therefore, it is necessary to keep the low-frequency OOV word and replace
it with <UNK> to maintain the integrity of the sentence. The BLEU scores of the Transformer
model combined with OOV processing are shown in Table 3.
Table 3 shows that the BLEU score after Transformer+OOV processing was similar to that of
seq2seq+OOV processing. When the low-frequency OOV word was reserved, the BLEU score
was 37.89 (+0.63) for NIST06 and 30.84 (+0.09) for NIST08. When the low-frequency
OOV word was deleted, the BLEU score was 37.17 (-0.09) for NIST06 and 30.33 (-0.42)
for NIST08. Therefore, it was concluded from the results of the models that after
replacing the high-frequency OOV words based on similarity, replacing the low-frequency
OOV words with <UNK> could realize higher translation quality.
In the context of cross-cultural communication, the process of automatic translation
between Chinese and English can easily lead to translation errors due to cultural
differences, but after being processed by the OOV method designed in this study, the
translation can be improved. The following sentence is shown as an example:
他昨天买了一件文化衫,作为朋友的生日礼物。
In this sentence, ``文化衫'' can be regarded as a low-frequency OOV word. When the Transformer
model is used for translation, the result is:
He bought a cultural shirt yesterday as a birthday gift for his friend.
The Chinese word "文化衫" means a round-necked shirt with patterns and texts printed
on it, which is used by young people to express their emotions, personality, and values.
A direct translation of "cultural shirt" cannot express its meaning correctly.
If deletion is done in the OOV processing, the result obtained is:
He bought one yesterday as a birthday present for his friend.
The word "culture shirt" is deleted as an important part of the sentence, and the
sentence loses its original meaning. In the proposed OOV processing method, the word
is retained. After using similar word replacement, the result obtained is:
He bought a T-shirt yesterday as a birthday present for his friend.
The analysis of this example further demonstrates the reliability of the OOV processing
method designed in this study for automatic Chinese-English translation in cross-cultural
communication.
Fig. 3. Comparison of the BLEU score between the seq2seq model and Transformer model.
Table 1. Parameter settings.
Seq2seq model combined with attention mechanism
|
Number of network layers
|
6
|
Neuron type
|
LSTM
|
Encoder
|
6 layers of LSTM
|
Decoder
|
6 layers of LSTM
|
Number of neurons
|
256
|
Word vector dimension
|
256
|
Batch size
|
128
|
Dropout
|
0.2
|
Learning rate
|
1.0
|
Transformer model
|
Number of network layers
|
6
|
Word vector dimension
|
512
|
Hidden layer state dimension of feedforward neural network
|
2048
|
Head number
|
8
|
Batch size
|
6250
|
Dropout
|
0.1
|
Table 2. BLEU scores after seq2seq+OOV processing.
|
Low-frequency OOV words
|
NIST06
|
NIST08
|
Seq2seq
|
-
|
36.45
|
30.16
|
Seq2seq+OOV processing
|
Retained
|
37.12
|
30.34
|
Seq2seq+OOV processing
|
Deleted
|
36.16
|
30.08
|
Table 3. BLEU scores after Transformer+OOV processing.
|
Low-frequency OOV words
|
NIST06
|
NIST08
|
Transformer
|
-
|
37.26
|
30.75
|
Transformer+OOV processing
|
Retained
|
37.89
|
30.84
|
Transformer+OOV processing
|
Deleted
|
37.17
|
30.33
|
4. Conclusion
This paper presented an automatic Chinese-English translation algorithm for cross-cultural
communication. Assuming that the quality of Chinese-English automatic translation
can be improved by processing OOV words, a method for processing OOV words was designed.
Tests on two models showed that the Transformer model had higher BLEU scores than
the seq2seq model, indicating better performance in automatic Chinese-English translation.
After OOV processing, retaining low-frequency OOV words effectively improved the BLEU
score, indicating that the translation quality was improved. However, this research
also had some limitations, such as the high reliance on dictionaries in the processing
of OOV words and the study of OOV words in only Chinese and English languages. Therefore,
in future work, more in-depth research on the processing of OOV words is needed to
reduce the reliance on dictionaries, and the method will be applied to more automatic
translations in different languages to expand its applicability and promote better
applications in solving translation tasks.
REFERENCES
C. Xu, Q. Li, "Machine Translation and Computer Aided English Translation," Journal
of Physics: Conference Series, Vol. 1881, No. 4, pp. 1-8, Jan. 2021.
C. Yang, "A Study of Influences of Big Data on Machine Translation and Enlightenment
for Translation Teaching in Cross-cultural Communi-cation," 2020 International Conference
on Information Science and Education (ICISE-IE), Vol. 2020, pp. 228-232, Dec. 2020.
S. Narzary, M. Brahma. B. Singha, R. Brahma, B. Dibragede, S. Barman, S. Nandi, B.
Som, "Attention based English-Bodo Neural Machine Translation System for Tourism Domain,"
2019 3rd International Conference on Computing Methodologies and Communication (ICCMC),
pp. 335-343, Aug. 2019.
S. Li, "Research on the External Communication of Chinese Excellent Traditional Culture
from the Perspective of Machine Translation," Journal of Physics: Conference Series,
Vol. 1744, No. 3, pp. 1-8, 2021. http://dx.doi.org/10.1088/1742-6596/1744/3/032019
T. Kano, S. Sakti, S. Nakamura, "Transformer-Based Direct Speech-To-Speech Translation
with Transcoder," 2021 IEEE Spoken Language Technology Workshop (SLT), Vol. 2021,
pp. 958-965, Jan. 2021.
P. K. Nagaraj, K. S. Ravikumar, M. S. Kasyap, M. H. S. Murthy, J. Paul, “Kannada to
English Machine Translation Using Deep Neural Network,” Ingénierie des Systèmes D
Information, Vol. 26, No. 1, pp. 123-127, Feb. 2021.
X. Li, X. Hao, "English Machine Translation Model Based on Artificial Intelligence,"
Journal of Physics: Conference Series, Vol. 1982, No. 1, pp. 1-6, May. 2021.
G. Tiwari, A. Sharma, A. Sahotra, R. Kapoor, "English-Hindi Neural Machine Translation-LSTM
Seq2Seq and ConvS2S," 2020 International Conference on Communication and Signal Processing
(ICCSP), Vol. 2020, pp. 871-875, July. 2020.
K. Saengthongpattana, K. Kriengket, P. Porkaew, T. Supnithi, "Thai-English and English-Thai
Translation Performance of Transformer Machine Translation," 2019 14th International
Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP),
Vol. 2019, pp. 1-5, Oct. 2019.
M. S. Kumar, D. Dipankar, B. Sivaji, "MTIL2017: Machine Translation Using Recurrent
Neural Network on Statistical Machine Translation," Journal of Intelligent Systems,
Vol. 28, No. 3, pp. 447-453, May 2018.
R. Baruah, R. K. Mundotiya, A. K. Singh, "Low Resource Neural Machine Translation:
Assamese to/from Other Indo-Aryan (Indic) Languages," Transactions on Asian and Low-Resource
Language Information Processing, Vol. 21, No. 1, pp. 19.1-19.32, 2022.
Z. Tan, J. Su, B. Wang, Y. Chen, X. Shi, "Lattice-to-sequence attentional Neural Machine
Translation models," Neurocomputing, Vol. 284, No. APR.5, pp. 138-147, April. 2018.
X. Li, V. Krivtsov, K. Arora, "Attention-based deep survival model for time series
data," Reliability Engineering & System Safety, Vol. 217, pp. 293-304, 2022.
J. Cho, S. Watanabe, T. Hori, M. K. Baskar, H. Inaguma, J. Villalba, N. Dehak, "Language
Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition,"
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), Vol. 2019, pp. 6191-6195, April. 2019.
D. D. Kalamkar, K. Banerjee, Srinivasan, S. Sridharan, E. Georganas, M. E. Smorkalov,
C. Xu, A. Heinecke, "Training Google Neural Machine Translation on an Intel CPU Cluster,"
2019 IEEE International Conference on Cluster Computing (CLUSTER), Vol. 2019, pp.
1-10, Nov. 2019.
W. Hu, Y. Zhang, Q. Guo, X. Huang, G. Li, W. Wang, Y. Meng, "Research on Short-Term
Load Forecasting Method of Power System Based on Seq2Seq-Attention Model," 2020 IEEE
4th Conference on Energy Internet and Energy System Integration (EI2), pp. 227-232,
Oct. 2020. http://dx.doi.org/10.1109/EI250167.2020.9346583
H. Luo, S. Zhang, M. Lei, L. Xie, "Simplified Self-Attention for Transformer-Based
end-to-end Speech Recognition," 2021 IEEE Spoken Language Technology Workshop (SLT),
Vol. 2021, pp. 75-81, Jan. 2021.
K. Jin, X. Zhang, J. Zhang, "Learning to Generate Diverse and Authentic Reviews via
an Encoder-Decoder Model with Transformer and GRU," 2019 IEEE International Conference
on Big Data (Big Data), pp. 3180-3189, Dec. 2019.
E. Egorova, L. Burget, "Out-of-Vocabulary Word Recovery using FST-Based Subword Unit
Clustering in a Hybrid ASR System," 2018 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), Vol. 2018, pp. 5919-5923, April. 2018.
L. Nguyen, H. H. Chung, K. V. Tuliao, T. M. Y. Lin, "Using XGBoost and Skip-Gram Model
to Predict Online Review Popularity," SAGE Open, Vol. 10, No. 4, pp. 215824402098331,
Oct. 2020.
H. K. Vydana, M. Karafiát, K. Zmolikova, L. Burget, H. Černocký, “Jointly Trained
Transformers Models for Spoken Language Translation,” ICASSP 2021 - 2021 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2021, pp. 7513-7517,
June. 2021.
Author
Jiayan Duan was born in Hebei, China, in 1983. From 2003 to 2007, she studied
at Beijing University of Chemical Technology and received a bachelor’s degree in 2007.
From 2013 to 2015, she studied at Beijing International Studies University and received
a master’s degree in 2015. Currently, she works at Yanching Institute of Technology.
She has published eight academic papers and translated two books. Her main research
interests include applied translation studies and translation teaching theory and
practice.
Hongwei Ma was born in Changchun City, Jilin Province, China. He received an M.A.
degree in foreign linguistics and applied linguistics from Changchun University of
Technology in 2011 and an M.A. degree in English translation from Beijing Normal University
in 2016. He has been working at Yanching Institute of Technology since 2021. He is
engaged in research on English language education, translation and cross-cultural
communication, and comparative literature.
Junxia Wang was born in Shan'xi, China, in 1982. From 2000 to 2007, she studied
at China University of Geosciences and obtained a bachelor's degree and master's degree.
From 2007 to present, she has worked at Yanching Institute of Technology. She has
undertaken two projects related to teaching supported by Education Department of Hebei
Province. She has published over 20 academic papers and 7 books. Her main research
interests include applied linguistics and teaching