Mobile QR Code QR CODE

  1. (Department of International Educational Exchange, Tangshan Vocational and Technical College, Tangshan, Hebei 063000, China )

Machine translation, Tense recognition, Tense translation, Chinese-English translation, Bilingual evaluation understudy

1. Introduction

Translations are becoming increasingly important as cross-cultural communication becomes more frequent [1]. Human translation is increasingly challenging to meet the current huge demand for translation. Therefore, machine translation (MT) has been studied widely [2]. MT can automatically convert the text of language A into the text of language B, which is highly efficient and makes communication easier [3]. With the development of deep learning, neural machine translation (NMT) has become a mainstream method with a better translation effect than traditional MT [4]. NMT has promising applications in translating many languages [5], and its research is in progress [6]. Choi et al. [7] designed a fine-grained attention mechanism, conducted experiments on En–De and En–Fi translation tasks, and reported that the method improved the translation quality. Sun et al. [8] examined Tibetan–Chinese NMT. They designed a method combining techniques, such as stop list and reverse translation. They found through experiments that the bilingual evaluation understudy (BLEU) value of Tibetan-Chinese NMT increased from the initial 5.53 to 19.03. Martinez et al. [9] proposed a method combining character-level information for subword segmentation in NMT, used a custom algorithm to select the binary character n-gram features, and verified the advantage of the method in handling resource-constrained languages and its better performance in BLEU score. Ma [10] proposed a grammar-based approach that merges source-side grammatical structures into the attention mechanism and location coding. Experimentally, the improvement of BLEU was 2.32, 2.91, and 1.03 for English-Japanese, English–Chinese, and English–German translation tasks, respectively. Chinese–English translations are used extensively; however, there is a great deficiency in the translation of tenses when using MT or NMT for translation. The English tense can be reflected by verb morphology, while Chinese does not have word morphology changes, and the Chinese verb does not contain tense information. For example, in sentences such as "我要去吃饭了", "我正在吃饭" and "我吃过饭了", the verbs have no change as the tense changes, which leads to a poorer quality of MT in Chinese–English translations compared with English–Chinese translations. Therefore, this paper focuses mainly on how to achieve recognition and translation of different tenses in Chinese–English translation. Based on NMT, this paper combined a neural network model to recognize the different tenses of Chinese verbs to achieve the consistency of tense translations in Chinese–English translations. Moreover, the effectiveness of this method in improving the translation quality was verified through experiments. This paper provides a new method to improve Chinese-to-English translations in machine translations. The proposed method can be applied to more tense recognition and translation problems in different languages, promoting the further development of machine translations.

2. Machine Translation

2.1 Neural Machine Translation

NMT uses an encoder-decoder structure [11]. The encoder part encodes the source language sequence and outputs a fixed-length vector representation $\mathrm{C}$, which is called context vector representation. The vector representation was decoded in the decoder part to obtain the translated sequence. Before the translation, the original language needs to be converted to digital vectors to facilitate computer processing. This process is called embedding word embedding. The commonly used models include COBW, Skip-Gram, and Word2vec [12]. There are two frequently used methods in the decoding stage.

Greedy search [13]: If the output sequence of the decoder is $\hat{Y}=\left(\hat{y}_{1},\hat{y}_{2},\cdots ,\hat{y}_{T}\right)$ and the transliteration word list is written as $V$, the decoding process of a greedy search is as follows. ① After the source language sequence is encoded, start symbol <bos> is input to the decoder to start decoding; ② the probability of every word in $V$ is calculated, and the result is generated sequentially; ③ at the $t$ moment, according to the formula, $\hat{y}_{t}=\underset{y\in V}{argmax}\log p\left(y|\hat{y}_{0\sim t-1},x_{1\sim T'};\theta \right)$, the word with the highest probability is selected; ④ the decoder generates symbol <eos>, the decoding ends, and the final translation is obtained.

Cluster search [14]: greedy search falls easily into a local optimum. The cluster search first caches the results, whose number equals the cluster width, and outputs the result with the highest comprehensive probability. The result obtained is more diverse and more convergent to the global optimum. Its decoding process is as follows. ① Let the cluster width be $K$ at the $t-1$ moment. The cluster candidate sequence is $C_{t-1}=\left\{\overset{˜}{y}_{0\sim t-1}^{\left(1\right)},\overset{˜}{y}_{0\sim t-1}^{\left(2\right)},\cdots ,\overset{˜}{y}_{0\sim t-1}^{\left(K\right)}\right\}$. ② At the $t$ moment, a greedy search is performed on $\mathrm{K}$ candidate sequences for $\mathrm{K}$ times: $C_{t}=$ $C_{t}=\left\{\overset{˜}{y}_{0\sim t}^{\left(1\right)},\overset{˜}{y}_{0\sim t}^{\left(2\right)},\cdots ,\overset{˜}{y}_{0\sim t}^{\left(K\right)}\right\}=\underset{y_{t}\in V,y_{0\sim t-1}\in C_{t-1}}{\text{argsort}}^{K}\sum _{t'=0}^{t}\log p\left(y_{t'}|\hat{y}_{0\sim t'-1},x_{1\sim T'};\theta \right)\,.$ ③ when a sequence outputs <eos>, it means that the decoding of this sequence is completed; after $K$ sequences are decoded, they are reordered using logarithmic regularization: $\hat{\mathrm{Y}}=\underset{\hat{\mathrm{Y}}_{1},\cdots ,\hat{\mathrm{Y}}_{\mathrm{K}}}{\text{argmax}}\left(\frac{1}{\left| \mathrm{Y}\right| }\right)^{\alpha }\log \mathrm{p}\left(\mathrm{Y}|\mathrm{X};\theta \right)$, where $\left| \mathrm{Y}\right| $ refers to the sequence length, and $\alpha $ is usually 0.6; the value with the largest probability is output.

2.2 Long Short-term Memory-based Neural Machine Translation

Recurrent neural network (RNN)-based NMT is a commonly used one [15]; however, RNN is prone to gradient disappearance when dealing with long sequences [16]. LSTM-based NMT emerged to solve this problem [17]. LSTM is an improvement of RNN and has good performance in pattern recognition and data prediction [18]. LSTM neurons mainly include input gates, forgetting gates, output gates, and memory cells, and the computational process is described as follows.

The input gate $I_{t}$, forgetting gate $F_{t}$, and output gate $O_{t}$ are calculated:

$I_{t}=\sigma \left(X_{t}W_{xi}+H_{t-1}W_{hi}+b_{i}\right)$,
$F_{t}=\sigma \left(X_{t}W_{xf}+H_{t-1}W_{hf}+b_{f}\right)$,
$Q_{t}=\sigma \left(X_{t}W_{xo}+H_{t-1}W_{ho}+b_{o}\right)$,

where $X_{t}$ refers to the input at the current moment, $H_{t-1}$ is the hidden vector at the previous moment, and $W$ and $b$ are the weight and bias of every layer.

The candidate memory cell $\overset{˜}{C}_{t}$ was calculated:

$\overset{˜}{C}_{t}=\tanh \left(X_{t}W_{xc}+H_{t-1}W_{hc}+b_{c}\right)$,

where $C_{t}$ and $C_{t-1}$ refer to the memory cells of the current and previous moment, respectively.

The hidden state $H_{t}$ was calculated:

$H_{t}=O_{t}*\tanh \left(C_{t}\right)$.

The Bi-LSTM model [19], which consists of forward and backward LSTMs to obtain past and future vector information, is proposed to solve the problem that LSTM cannot learn from future text sequences. It was assumed that the input vector of the current time step is $X_{n}$, and the matrix weight is $W_{n}$, then the forward calculation formula of Bi-LSTM is

$\overset{\rightarrow }{c}_{n},\overset{\rightarrow }{h}_{n}=g^{LSTM}\left(\overset{\rightarrow }{c}_{n-1},\overset{\rightarrow }{h}_{n-1},W_{n}\right)$.

The backward calculation formula is

$\overset{\leftarrow }{c}_{n},\overset{\leftarrow }{h}_{n}=g^{LSTM}\left(\overset{\leftarrow }{c}_{n-1},\overset{\leftarrow }{h}_{n-1},W_{n}\right)$.

3. Methods for Recognizing Different Tenses in Chinese

3.1 Collation of the Corpus Data

Before conducting tense recognition, the bilingual corpus data needs to be organized first. The tense recognition method used in this paper targeted verbs. The open-source POS toolkit from Stanford was used to recognize English verb tenses. English verb tenses are divided into the following categories.

(1) VB: Verb original form

(2) VBP: present tense, non-third person singular

(3) VBZ: present tense, third person singular

(4) VBD: past tense

(5) VBG: present participle

(6) VBN: past participle

(7) MD: modal verb

For Chinese verb tenses, the alignment information of attention in NMT is used to map English tenses to the corresponding Chinese verbs. All Chinese verbs are labeled as VV, and non-verbs are labeled as None. English tense information and Chinese verb information are converted to vectors and imported into NMT. When generating Chinese words in every step, the tense information corresponding to the English word with the largest probability corresponding to the current attention is identified and transferred to the Chinese word at the decoding step to obtain the tense labeling sequence corresponding to the Chinese word position. The tense information of the training data is obtained in this way. After training, the NMT model tends to be stable after the 10$^{\mathrm{th}}$epoch. Therefore, the results of the 10-12$^{\mathrm{th}}$ epoch are used as the final data. The process of data collation is as follows.

(1) If the data from two of the three epochs label the verb as tense A, the final tense of the verb is A.

(2) If the data from two of the three epochs label the verb as None, the other result that is not None is taken as the final tense.

(3) If the results of all three epochs are different, the largest epoch result that is not None is taken as the final tense.

3.2 NMT Model Combined with Tense Recognition

LSTM is used to predict the tense of Chinese words. The source-end sequences are transformed into word vectors by embedding, and the tense prediction information of the current word is obtained through the LSTM network. For every Chinese verb, let the tense prediction result of LSTM be $\mathrm{T}_{\mathrm{s}}$ and the tense of the English verb obtained by translation be $T_{t}$. In the NMT translation process, in the decoder stage, if the attention aligned by the source end at the current time step is a verb, then, in the process of cluster search, a limiting condition is added: $T_{s}=T_{s}$, to realize that the tense of the candidate word at the target end is consistent with the Chinese tense at the source end. The NMT model combining tense recognition is illustrated with a simple sentence as an example.

The sentence ``我昨天打扫了房间'' is taken as an example. Before the translation, the sentence is passed through LSTM to obtain the tense annotation sequence. ``打扫'' is recognized as past tense, and this tense information is saved. The sentence passes through the encoder–decoder module. When generating the time step of ``cleaned'', the corresponding source-end position in the attention alignment matrix is ``打扫''. In the decoding process, the value of cluster search is set as 10. The first 10 candidate words with the highest probability are selected, and the English tense of these 10 words is obtained. The word with the highest probability is ``clean'', which is in the present tense and is inconsistent with tense recognition, so it was eliminated. Thus, the word with the second highest probability, i.e., ``cleaned'', rises to first place and becomes the final translation.

4. Analysis of Results

The experiments were conducted on a Linux system. The deep learning framework used was Pytorth, which is flexible in operation, easy to deploy, and can support NMT research well. The experimental dataset was a NIST dataset ( The dataset contains audio files and corresponding text transcriptions from different languages and topics. The English part contains audio files and transcribed texts from various scenarios, such as news broadcasts, teleconferences, and interviews. The Mandarin part contains audio files and transcribed texts from scenarios, such as teleconference and narration. Its wide range of speech sources, high speech quality, and the inclusion of different languages and topics make it a high reference value in speech recognition. The NIST dataset contained the original corpus except for sentences with a sequence length larger than 50. The BPE tool was used for segmentation, and <bos> and <eos> markers were added. For example, [‘<bos>’, ’他’, ’有’, ’一只’, ’猫’, ’。’, ’<eos>’], [‘<bos>’, ’He’, ’has’, ’a’, ’cat’, ’.’, ’<eos>’]. The NIST dataset contained subsets, such as NIST 04 (MT04). This paper used MT05 as the validation set and MT04, MT06, and MT08 as the test sets. The details are listed in Table 1.

The tense training of the source corpus was achieved by training the NMT. The neural network for tense recognition was trained with 20 epochs. The initial learning rate was set to 0.001. The effects of tense recognition were evaluated by accuracy. The translation performance was evaluated using the BLEU score [20]. The corresponding formulae are as follows.

$p_{n}=\frac{\sum _{c\in candidates}\sum _{n-gram\in c}count\left(n-gram\right)}{\sum _{c'\in candidates'}\sum _{n-gram'\in c'}count\left(n-gram\right)}$,
$BP=\left\{\begin{array}{l} 1,if\,c>r\\ exp^{1-r/c},\,\,if\,c\leq r \end{array}\right.$,
$BLEU=BP\cdot exp\left(\sum _{n=1}^{N}w_{n}\log p_{n}\right)$,

where the candidate refers to the machine translation; $p_{n}$ is the translation accuracy; $BP$ is the penalty factor; $c$ and $r$ are the length of machine translation and reference translation, respectively; $w_{n}$ is the weight.

First, the effectiveness of two methods, LSTM and Bi-LSTM, for tense recognition was compared. Table 2 lists the accuracy of tense recognition of the two methods for the validation and test sets.

When using LSTM as the neural network model for tense recognition, its accuracy was approximately 80% (maximum: 83.64%; minimum: 80.67%; average: 82.09%), while its accuracy was around 90% (maximum: 91.64%, minimum: 88.42%; average: 89.89%) when using Bi-LSTM as the recognition model (Table 2). The average accuracy of the latter was 7.8% higher than that of the former, indicating that the Bi-LSTM model was more accurate in recognizing different tenses in Chinese. Bi-SLTM improved the effect of Chinese verb tense recognition significantly by analyzing the past and future context information. Therefore, it was more suitable for Chinese tense recognition.

The translation effect of the NMT models combined with tense recognition was analyzed. The baseline used was the RNN-based NMT model, the LSTM-based NMT model, and the Bi-LSTM-based NMT model. The Bi-LSTM-based tense recognition was combined with the baseline. The translation effects of different models were compared, and Table 3 lists the results.

The BLEU score of the NMT models combined with tense recognition was significantly higher than that of the baseline (Table 3). First, a comparison of the baseline showed that the average BLEU score of the Bi-LSTM-based model was 33.43, which was 1.2 larger than the RNN-based model (32.23) and 0.56 larger than the LSTM-based model (32.87). The BLEU score of the RNN-based model combined with tense recognition was 36.53, which was 4.3 larger than the baseline. The BLEU score of the LSTM-based model combined with tense recognition was 38.92, which was 6.05 larger than the baseline. The BLEU score of the Bi-LSTM-based model combined with tense recognition was 40.33, which was 6.9 larger than the baseline. These results show that combining tense recognition improved the NMT quality.

Finally, the translation results of the NMT models combined with tense recognition were analyzed with several sentences as examples.

According to Table 4, in the first example sentence, for the verb ``伤害'' in the source sentence, the translation result of the reference translation was ``had tarnished''; the translation result of the Bi-LSTM-based model without the recognition of Chinese verb tense was ``harms''; the translation result of the NMT model combined with tense recognition was ``harmed''. In the second example sentence, for the verb ``免除'' in the source sentence, the translation result of the reference translation was ``removed''; the translation result of the Bi-LSTM-based model was ``sacks''; the translation result of the Bi-LSTM-based model combined with verb tense recognition was ``fired''. These results suggested that the Bi-LSTM-based model combined with tense recognition was reliable in recognizing verb tense.

Table 1. Experimental Data Set.

Training set

Validation set

Test set










1082 sentences

1788 sentences

1664 sentences

1357 sentences

Table 2. Accuracy of Tense Recognition.















Table 3. Comparison of Translation Effects Between Different Models.





Average value



















RNN+tense recognition






LSTM+tense recognition






Bi-LSTM+tense recognition






Table 4. Example Sentence Analysis.

The original Chinese sentence


Reference translation

The closing of the embassies had angered the Philippine government, which said that the so-called threats were exaggerated and the closing had tarnished Philippines image.


The closure of the embassies had angered the Philippine government, which said the alleged threat was exaggerated and the closure of the embassies harms the image of the Philippines.

Bi-LSTM + tense recognition

The closure of the embassies had angered the Philippine government, which said that the alleged threats were exaggerated and the closure of embassies harmed the image of Philippines.

The original Chinese sentence


Reference translation

Manila removed an intelligence officer from the police, for he had released unconfirmed intelligence about the terrorist threat to the Australia and Canadian embassies.


Manila sacks a police intelligence officer after he revealed unsubstantiated intelligence about terrorist threats against the Australian and Canadian embassies.

Bi-LSTM + tense recognition

Manila fired the police intelligence official because he leaked the unverified intelligence about terrorist threats upon the Australian and Canadian embassies.

5. Discussion

The tense recognition and translation problem in Chinese-to-English machine translation is a vital part, which significantly influences the quality of Chinese-to-English translation. In this paper, an LSTM-based neural network method was designed for tense recognition translation, and its experimental analysis was carried out on an NIST dataset.

The experimental results showed that Bi-LSTM had higher recognition accuracy than LSTM in tense recognition. Compared with LSTM, Bi-LSTM performed better in tense recognition because it learned sufficient information in both directions through forward LSTM and backward LSTM. In comparison, Bi-LSTM achieved an average accuracy of 89.89% on the test set, 7.8% better than LSTM. This demonstrated the superiority of Bi-LSTM in Chinese tense recognition. The machine translation showed that the BLEU scores of the RNN-based NMT model, the LSTM-based NMT model, and the Bi-LSTM-based NMT model were all improved to a certain extent after combining tense recognition, demonstrating the reliability of tense recognition for improving translation results. The analysis results of the example sentences in Table 4 suggested that Bi-LSTM identified and translated the tenses in Chinese accurately, making the obtained translations closer to the semantics of the source sentences.

This study contributes to the tense recognition and translation of Chinese-to-English translation. Maintaining the tense consistency between Chinese and English through the recognition and translation of verbs effectively improves the quality of Chinese-to-English translation, providing a theoretical basis for further improving the level of machine translation and a new idea for the problem of tense recognition and translation in other languages, such as English-to-Chinese and Chinese-to-French translation.

6. Conclusion

This paper mainly studied machine translation of Chinese to English, designed a neural network-based recognition method to solve the problem of recognizing different tenses of Chinese verbs, and combined it with NMT models. The experiments showed that the Bi-LSTM-based model showed higher accuracy in recognizing different tenses than the LSTM-based model, with an average accuracy of 89.89%. In the performance comparison of NMT models combined with tense recognition, the BLEU score of the Bi-LSTM-based NMT model combined with tense recognition was higher, with an average value of 40.33. These results confirmed that the Bi-LSTM-based model combined with tense recognition is reliable for improving the quality of Chinese–English translation and can be further promoted and applied in practice.


A. V. Potnis, R. C. Shinde, S. S. Durbha, ``Towards Natural Language Question Answering Over Earth Observation Linked Data Using Attention-Based Neural Machine Translation,'' IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium, Vol. 2020, pp. 577-580, Sep. 2020.DOI
A. G. Dorst, S. Valdez, H. Bouman, ``Machine translation in the multilingual classroom: How, when and why do humanities students at a Dutch university use machine translation?,'' Translation and Translanguaging in Multilingual Contexts, Vol. 8, No. 1, pp. 49-66, Feb. 2022.DOI
C. Lalrempuii, B. Soni, P. Pakray, ``An Improved English-to-Mizo Neural Machine Translation,'' ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 20, No. 4, pp. 1-21, May. 2021.DOI
R. Kr\"{u}ger, ``Some Translation Studies informed suggestions for further balancing methodologies for machine translation quality evaluation,'' Translation Spaces, Vol. 11, No. 2, pp. 213-233, March. 2022.DOI
A. Ba, B. Bjd, A. Ra, ``Impact of Filtering Generated Pseudo Bilingual Texts in Low-Resource Neural Machine Translation Enhancement: The Case of Persian-Spanish - ScienceDirect,'' Procedia Computer Science, Vol. 189, pp. 136-141, July. 2021.DOI
T. K. Lam, J. Kreutzer, S. Riezler, ``A reinforcement learning approach to interactive-predictive neural machine translation,'' Proceedings of the 21st Annual Conference of the European Association for Machine Translation, Vol. 2018, pp. 169-178, May. 2018.DOI
H. Choi, K. Cho, Y. Bengio, ``Fine-Grained Attention Mechanism for Neural Machine Translation,'' Neurocomputing, Vol. 284, No. APR.5, pp. 171-176, March. 2018.DOI
Y. Sun, C. Yong, ``Research on Tibetan-Chinese neural network machine translation with few samples,'' Journal of Physics: Conference Series, Vol. 1871, No. 1, pp. 1-8, April. 2021.DOI
A. Martinez, K. Sudoh, Y. Matsumoto, ``Sub-Subword N-Gram Features for Subword-Level Neural Machine Translation,'' Journal of Natural Language Processing, Vol. 28, No. 1, pp. 82-103, Jan. 2021.DOI
C. Ma, ``Syntax-based Transformer for Neural Machine TranslationSyntax-based Transformer for Neural Machine Translation,'' Journal of Natural Language Processing, Vol. 28, No. 2, pp. 682-687, Jan. 2021.DOI
J. Su, J. Chen, H. Jiang, C. Zhou, H. Lin, Y. Ge, Q. Wu, Y. Lai, ``Multi-modal neural machine translation with deep semantic interactions - ScienceDirect,'' Information Sciences, Vol. 554, pp. 47-60, Nov. 2020.DOI
S. Tiun, U. A. Mokhtar, S. H. Bakar, S. Saad, ``Classification of functional and non-functional requirement in software requirement using Word2vec and fast Text,'' Journal of Physics: Conference Series, Vol. 1529, No. 4, pp. 1-6, April. 2020.DOI
R. Zarkami, M. Moradi, R. S. Pasvisheh, A. Bani, K. Abbasi, ``Input variable selection with greedy stepwise search algorithm for analysing the probability of fish occurrence: A case study for Alburnoides mossulensis in the Gamasiab River, Iran,'' Ecological Engineering, Vol. 118, pp. 104-110, May. 2018.DOI
P. G. Shambharkar, P. Kumari, P. Yadav, R. Kumar, ``Generating Caption for Image using Beam Search and Analyzation with Unsupervised Image Captioning Algorithm,'' 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Vol. 2021, pp. 857-864, May. 2021.DOI
Y. Liu, D. Zhang, L. Du, Z. Gu, J. Qiu, Q. Tan, ``A Simple but Effective Way to Improve the Performance of RNN-Based Encoder in Neural Machine Translation Task,'' 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC), Vol. 2019, pp. 416-421,June. 2019.DOI
Z. Liu, F. Qi, ``Research on advertising content recognition based on convolutional neural network and recurrent neural network,'' International Journal of Computational Science and Engineering, Vol. 24, No. 4, pp. 398-404, Jan. 2021.DOI
K. Shuang, R. Li, M. Gu, J. Loo, S. Su, ``Major-minor long short-term memory for word-level language model,'' IEEE Transactions on Neural Networks and Learning Systems, Vol. 31, No. 10, pp. 3932-3946, Dec. 2020.DOI
S. Xu, R. Niu, ``Displacement prediction of Baijiabao landslide based on empirical mode decomposition and long short-term memory neural network in Three Gorges area, China,'' Computers & Geosciences, Vol. 111, pp. 87-96, Feb. 2018.DOI
M. Banna, T. Ghosh, M. Nahian, K. A. Taher, M. S. Kaiser, M. Mahmud, M. S. Hossain, K. Andersson, ``Attention-based Bi-directional Long-Short Term Memory Network for Earthquake Prediction,'' IEEE Access, Vol. 9, No. 56589-56603, April. 2021.DOI
H. I. Liu, W. L. Chen, ``Re-Transformer: A Self-Attention Based Model for Machine Translation,'' Procedia Computer Science, Vol. 189, No. 8, pp. 3-10, July. 2021.DOI


Xuran Ni

Xuran Ni was born in Hebei, China in 1983. From 2002 to 2006, she studied at Hebei University and received her bachelor's degree in 2006. From 2011 to 2015, she studied at Capital Normal University and received her Master's degree in 2015. She has published 17 papers. Her research interests include English teaching and reform.