1. Introduction
               People’s interest in music has grown stronger as society has evolved. The piano is
                  a musical instrument with a rich timbre. The instrument has appeared in many musical
                  works, has extensive applications in concerts and music festivals, and is deeply loved
                  by people. With the development of computer technology, music is also becoming increasingly
                  digitalized, and automatic music transcription (AMT) [1] has also been studied more deeply. AMT refers to using computers to convert notes
                  in musical signals into scores [2]. The original audio signals can be converted to symbols that are easier for humans
                  to understand through AMT [3], which is the basis for analyzing various kinds of music [4]. The conversion helps people better appreciate music and reduces manual notation
                  pressure. Furthermore, improving the accuracy of automatic transcription and marking
                  audio with AMT can also increase the effectiveness of a music search [5], so AMT plays an important role. As technology develops, many methods have been applied
                  to AMT [6]. Cheuk et al. [7] used two U-nets. The first U-net transcribed the spectrograms into the posterior
                  gram, and the second U-net converted the posterior gram back into the spectrogram
                  to achieve AMT. The experiments on three datasets showed that this method was more
                  accurate in note-level transcription. Skoki et al. [8] examined Sopela's AMT and determined the pitch prediction model by combining two
                  machine learning algorithms and frequency features, realizing promising transcription
                  accuracy. Kawashima [9] increased the automatic transcription accuracy using convolutional neural networks
                  (CNN) as post-processing before low-rank non-negative matrix factorization and assessed
                  the effectiveness of this method by simulation. Beltran [10] examined the influence of timbre on monophonic transcription and used deep saliency
                  models. The experimental results showed that the model was effective for the polyphonic
                  transcription of non-piano instruments, e.g., the F1 value of low instruments reached
                  0.9516. Steiner et al. [11] designed a method based on an echo state network. They reported a 1.8% and 1.4% improvement
                  in note detection compared to the bidirectional Long Short-Term Memory (LSTM) and
                  CNN, respectively. Wei et al. [12] proposed a semi-automatic method using the audio-to-musical instrument digital interface
                  (MIDI) alignment technique for automatic drum transcription. They demonstrated its
                  effectiveness in automatic transcription through experiments. Nakamura et al. [13] designed several Bayesian Markovian score models to achieve transcription of musical
                  rhythms and found through experiments that the method had good transcription accuracy
                  and computational efficiency. In AMT, the automatic transcription of a single note
                  is relatively simple, while the piano, as a musical instrument with polyphonic characteristics,
                  has multiple notes present simultaneously, which makes automatic transcription difficult.
                  The automatic transcription of polyphonic audio is still challenging, but there have
                  been relatively few studies on the automatic transcription of piano audio. Therefore,
                  this paper investigated the automatic transcription algorithm based on the polyphonic
                  characteristics of piano audio to improve the performance of automatic transcription
                  of piano audio. Three features were extracted from the piano video: short-time Fourier
                  transform (STFT), constant-Q transform (CQT), and variable-Q transform (VQT). The
                  CNN was combined with a bidirectional gated recurrent unit (BiGRU) for detecting the
                  note start point and the fundamental of the polyphone to improve the reliability of
                  the automatic transcription. The effectiveness of the method was demonstrated through
                  experiments on the MAPS dataset, providing a new method for AMT. This method can also
                  be applied to the AMT of various instruments and music to provide reliable support
                  for music information retrieval and analysis.
               
             
            
                  2. Automatic Transcription Algorithm for Piano
               
                     2.1 The Polyphonic Characteristics of Piano Audio
                  In music, the most fundamental unit is the musical note. A musical note refers to
                     a symbol used to represent different pitches. Each note can be marked with an English
                     letter, called the "musical alphabet".
                  
                  The distance between two notes of the same name is called an octave. According to
                     the twelve-tone equal temperament, an octave is separated into twelve equal parts;
                     each is called a semitone. Using an octave as an example, Table 1 lists the correspondence between the musical alphabet, syllable name, and numbered
                     musical notation.
                  
                  When a single piano key is struck, the lowest frequency sine component in the musical
                     signal of the note is called the fundamental tone; its corresponding frequency is
                     called the fundamental frequency. Recognizing the type of fundamental tone can identify
                     the pitch type because the fundamental tone of different pitches is different.
                  
                  In the piano, each semitone corresponds to a piano key. The fundamental frequencies
                     of the pitch corresponding to 88 piano keys are  
                  
                  
                  
                  where $\mathrm{f}_{0}$ stands for the standard pitch. The fundamental frequencies
                     of a standard piano range from 27.5 Hz to 4186.01 Hz.
                  
                  In the field of digital music, MIDI is used to represent tones. In a standard piano,
                     the relationship between the MIDI values and the fundamental frequency is
                  
                  
                  The MIDI numbers corresponding to the 88 piano keys (A0-C8) are 21-108.
                  The polyphonic characteristic of piano audio refers to the sound characteristics produced
                     when multiple notes are played on the piano simultaneously. Polyphony occurs when
                     multiple notes are played simultaneously on the piano, suggesting that each note emits
                     sound at different times. These sounds will interfere and resonate, producing a unique
                     timbre. Therefore, there is information on the fundamental frequencies of multiple
                     notes, making automatic piano transcription a challenge.
                  
                  
                        Table 1. Piano Notes in An Octave.
                     
                           
                              
                                 | 
                                    
                                 									
                                  Musical alphabet 
                                 								
                               | 
                              
                                    
                                 									
                                  C 
                                 								
                               | 
                              
                                    
                                 									
                                  D 
                                 								
                               | 
                              
                                    
                                 									
                                  E 
                                 								
                               | 
                              
                                    
                                 									
                                  F 
                                 								
                               | 
                              
                                    
                                 									
                                  G 
                                 								
                               | 
                              
                                    
                                 									
                                  A 
                                 								
                               | 
                              
                                    
                                 									
                                  B 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  Syllable name 
                                 								
                               | 
                              
                                    
                                 									
                                  do 
                                 								
                               | 
                              
                                    
                                 									
                                  re 
                                 								
                               | 
                              
                                    
                                 									
                                  mi 
                                 								
                               | 
                              
                                    
                                 									
                                  fa 
                                 								
                               | 
                              
                                    
                                 									
                                  so 
                                 								
                               | 
                              
                                    
                                 									
                                  la 
                                 								
                               | 
                              
                                    
                                 									
                                  ti 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  Numbered musical notation 
                                 								
                               | 
                              
                                    
                                 									
                                  1 
                                 								
                               | 
                              
                                    
                                 									
                                  2 
                                 								
                               | 
                              
                                    
                                 									
                                  3 
                                 								
                               | 
                              
                                    
                                 									
                                  4 
                                 								
                               | 
                              
                                    
                                 									
                                  5 
                                 								
                               | 
                              
                                    
                                 									
                                  6 
                                 								
                               | 
                              
                                    
                                 									
                                  7 
                                 								
                               | 
                           
                        
                     
                   
                
               
                     2.2 Analysis of Piano Audio Characteristics
                  Making the piano audio computable requires an analysis of its characteristics. For
                     audio signals, frequency-based analysis methods are more effective than time-domain
                     analysis, and the following two methods are commonly used. 
                  
                  (1) STFT
                  STFT [14] is a method that analyzes the time-frequency distribution of local signals to understand
                     the changing pattern of the signal. The corresponding calculation formula is
                  
                  
                  where $f\left(n\right)$ represents the time domain signal; $w\left(n\right)$ stands
                     for the window function; $w\left(n-m\right)$ stands for the sliding window.
                  
                  STFT pays more attention to the local information of the signal after windowing. Therefore,
                     it can extract the time-frequency information better.
                  
                  (2) CQT
                  STFT uses a fixed window length. The linearly distributed frequency points can lead
                     to errors in fundamental frequency recognition. CQT can make the frequency points
                     exponentially distributed [15], and the corresponding formula is
                  
                  
                  
                  
                  where
                  $f\left(n\right)$: signal sequence,
                  $k$: frequency point index of the CQT spectrum,
                  $w_{{N_{k}}}\left(n\right)$: a window function with a length of $N_{k}$,
                  $Q$: the constant factor of CQT,
                  $f_{s}$: sampling frequency,
                  $f_{k}$: the central frequency of the $k$-th spectral line in the CQT spectrum,
                  $f_{min}$: the lowest frequency,
                  $b$: number of frequency points within each octave.
                  (3) VQT
                  The VQT introduces parameter ${\Gamma}$ [16] to enhance the time resolution of the time-frequency representation. The central
                     frequency distribution is the same as that of CQT, and the relationship between the
                     bandwidth of frequency band $k$ and the central frequency can be expressed as
                  
                  
                  
                  when $\gamma =0$, VQT = CQT, while when $\gamma >0$, VQT has the same frequency resolution
                     as CQT, but the time resolution was improved significantly. Therefore, the spectral
                     characteristics obtained from the signal after VQT are better. 
                  
                
               
                     2.3 CNN-based Automatic Transcription Algorithm
                  CNN is an important component in deep learning that performs well in image recognition
                     and other fields [17]. In automatic piano transcription, there are mainly three tasks that need to be accomplished:
                  
                  (1) detection of the start point of musical notes,
                  (2) detection of the endpoint of musical notes,
                  (3) detection of the fundamental tone of the polyphone.
                  The start and end points of piano audio have distinct amplitude changes in the spectrogram.
                     
                  
                  CNN is suitable for extracting spatial structural features and has good generalization
                     ability. Therefore, this study analyzed the automatic transcription of piano audio
                     based on CNN.
                  
                  CNN uses convolutional layers as the core and extracts features using convolutional
                     operations. Rich features could be obtained through multiple convolutional kernels.
                     Some unimportant information was sampled and discarded through the pooling layer to
                     reduce the computational complexity. In the activation layer, nonlinear functions
                     were used to alleviate the problem of gradient disappearance. The commonly used functions
                     include
                  
                  
                  
                  
                  $sigmoid$ and $tanh$ are often used for fully connected layers, while $relu$ is often
                     used for convolutional layers.
                  
                  This paper combines the gated recurrent unit (GRU) with CNN to obtain the association
                     before and after sequences [18]. GRU simplifies the long short-term memory neural network and has a simpler structure
                     and higher efficiency than LSTM. GRU mainly trains the network through the reset gate
                     and the update gate. The formula for the update gate is expressed as
                  
                  
                  where
                  $x_{t}$: the current input vector,
                  $W_{z}$ and $U_{z}$: weight matrices,
                  $h_{t-1}$: the hidden state from the previous moment.
                  The formula for the reset gate is expressed as
                  
                  The current memory information is $\overset{˜}{h}_{t}=$ $\tanh \left[W_{k}x_{t}+U\left(r_{t}\odot
                     h_{t-1}\right)\right]$. Here, $\overset{˜}{h}_{t}$ is the candidate hidden state.
                     The hidden state storage information at the current moment is: $h_{t}=z_{t}\odot h_{t}+\left(1-z\right)\odot
                     \overset{˜}{h}_{t-1}$.
                  
                  A bidirectional GRU (BiGRU) was used to make the extracted piano audio feature information
                     more accurate. BiGRU includes a forward GRU and a backward GRU, which can model the
                     input data from both forward and backward directions.
                  
                  The CNN-BiGRU algorithm was obtained by combining CNN and BiGRU and applied to the
                     automatic transcription of piano audio, as shown in Fig. 1.
                  
                  According to Fig. 1, after extracting STFT, CQT, or VQT features from the piano audio signal, they are
                     used as input to the CNN-BiGRU algorithm to detect the note start and end points as
                     well as the fundamental tone of the polyphone. Only the first segment with a length
                     of 512 dimensions was selected when using STFT as input because the spectrum of STFT
                     is relatively long.
                  
                  Three CNN-BiGRU models have the same structure and use four convolutional layers.
                     The pooling layers use mean pooling with a step length of 2. All layers except the
                     output layer use the ReLU function, and the output layer uses the sigmoid function.
                     The difference is that the output layer of the CNN-BiGRU algorithm in detecting the
                     start and end points of notes has only one node, representing the probability of containing
                     the start and end points of notes in the representation of input video. The output
                     layer of the CNN-BiGRU algorithm in detecting the fundamental tone of the polyphone
                     has 88 nodes, representing the independent probability of each note being played in
                     the representation of the input video.
                  
                  
                        Fig. 1. Piano automatic transcription algorithm based on CNN-BiGRU.
 
                
             
            
                  3. Results and Analysis
               
                     3.1 Experimental Dataset
                  The experimental data comes from the MAPS dataset [19]. Each audio file has a standard file that annotates the start and end time and the
                     MIDI number of all notes. The examples are as follows:
                  
                  [0.336977, 0.510340): 69
                  [0.518616, 0.622360): 72
                  [0.635011, 0.738756): 76
                  [0.751406, 0.855150): 77
                  [0.867801, 1.098953): 69
                  [1.105229, 1.379819): 72
                  [1.392470, 1.666060): 74
                  [1.678711, 1.952301): 76
                  [1.964952, 2.346750): 64
                  ...
                  The value in the brackets indicates the start and end time of the note, and the number
                     at the end of the line is the MIDI number of the pitch, e.g., "69" at the end of the
                     first line means that the MIDI number is 69, which is a C4 note.
                  
                  The MAPS dataset includes nine directories, each containing 30 pieces of piano music.
                     There are seven directories of synthesized audio. ENSTDKCl (Cl) and ENSTDKAm (Am)
                     are recordings of real piano performances. In this article, Cl and Am served as the
                     test set. There are two combinations for selecting the training set:
                  
                  ① The synthesized audio of the first seven directories + the first 30 seconds of Cl
                     and Am;
                  
                  ② Only the synthesized audio of the first seven directories. 
                
               
                     3.2 Evaluation Indicators
                  The performance evaluation of the algorithm was based on the confusion matrix (Table 2).
                  
                  (1) Precision: the proportion of correctly detected notes to all detected notes, $P=TP/\left(TP+FP\right)$;
                  (2) Recall rate: the proportion of correctly detected notes to the total number of
                     notes, $R=TP/\left(TP+FN\right)$;
                  
                  (3) F-measure: the result considering both precision (P) and recall rate (R), $F1=\left(2\times
                     P\times R\right)/\left(P+R\right)$. 
                  
                  
                        Table 2. Confusion Matrix.
                     
                           
                              
                                 | 
                                    
                                 									
                                  Confusion matrix 
                                 								
                               | 
                              
                                    
                                 									
                                  Real value 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  Positive 
                                 								
                               | 
                              
                                    
                                 									
                                  Negative 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  Detection value 
                                 								
                               | 
                              
                                    
                                 									
                                  Positive 
                                 								
                               | 
                              
                                    
                                 									
                                  True Positive (TP) 
                                 								
                               | 
                              
                                    
                                 									
                                  False Positive (FP) 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  Negative 
                                 								
                               | 
                              
                                    
                                 									
                                  False Negative (FN) 
                                 								
                               | 
                              
                                    
                                 									
                                  True Negative (TN) 
                                 								
                               | 
                           
                        
                     
                   
                
               
                     3.3 Result Analysis
                  First, the synthesized audio of the first seven directories and the first 30 seconds
                     of Cl and Am were used as a training set to compare the effect of STFT/CQT/VQT as
                     an input on the detection performance of note start point. In addition, the CNN-BiGRU
                     algorithm was compared with the CNN and CNN-GRU algorithms. Table 3 lists the comparison results.
                  
                  According to Table 3, when using STFT as input, the P values of these algorithms were approximately 75%,
                     the R values were around 60%, and the F1-measures were below 70%. On the other hand,
                     when CQT was used as input, the performance of these algorithms was improved to some
                     extent. For example, the F1-measure of the CNN-BiGRU algorithm was improved by 9.97%
                     compared to using STFT as input. The comparison of different algorithms showed that
                     the F1-measure of the CNN-BiGRU algorithm was the highest. Finally, the P value of
                     the CNN-BiGRU algorithm was 81.26% when using VQT as the input, the R-value was 87.64%,
                     and the F1-measure was 84.33%, all the highest, demonstrating the effectiveness of
                     VQT and CNN-BiGRU in detecting the start point of notes.
                  
                  The performance of different features and algorithms was compared in terms of polyphonic
                     fundamental tone detection, and the results are displayed in Table 4.
                  
                  According to Table 3, the F1 values of these algorithms were all below 90% when using the STFT as input.
                     The CNN algorithm performed the worst in polyphonic fundamental tone detection, with
                     low P and R values and the lowest F1-measure, only 80.89%. The F1-measure of the CNN-BiGRU
                     algorithm was 89.25%, which was 8.36% higher than the CNN algorithm. When CQT was
                     used as the input, these algorithms exhibited improved performance in polyphonic fundamental
                     tone detection than when STFT was used. The F1-measure of the CNN-BiGRU algorithm
                     reached 95.88%, 6.63% higher than when STFT was used. Finally, the P and R values
                     and F1-measures of these algorithms were above 90% when VQT was used as input. The
                     F1-measure of the CNN-BiGRU algorithm was 97.25%, which was 1.37% higher than when
                     CQT was used. The results in Table 2 showed that VQT was more effective as a feature input for polyphonic fundamental
                     tone detection among the three types of features, STFT, CQT, and VQT, and proved that
                     the CNN-BiGRU algorithm performed better than the CNN and CNN-GRU algorithms in polyphonic
                     fundamental tone detection.
                  
                  Detection of the note endpoints was similar to that of the start points, and VQT,
                     in combination with CNN-BiGRU showed the best performance. Finally, the impact of
                     different training sets on automatic piano transcription from the audio was compared.
                     Fig. 2 presents the results of automatic transcription under two different training sets
                     Using VQT as the input and CNN-BiGRU as the algorithm.
                  
                  According to Fig. 2, in the automatic transcription of piano audio, the performance of the algorithm
                     in detecting the note start and end points was not as good as in detecting polyphonic
                     fundamental tone. The various indicators of the CNN-BiGRU algorithm in detecting polyphonic
                     fundamental tones reached over 90%, while the indicators for note start and end point
                     detection were below 90%. In piano audio, some notes may be played with low intensity,
                     which can cause missed detections and result in poor performance in detecting note
                     start points.
                  
                  From a comparison of the different training sets, when using the training set ②, the
                     CNN-BiGRU algorithm did not perform as well as the training set ① in terms of note
                     start and end point detection and polyphonic fundamental tone detection. Taking polyphonic
                     fundamental tone detection as an example, compared to using training set ①, the P
                     value of using the training set ② decreased by 1.73% (95.43%), the R-value decreased
                     by 3.18% (94.16%), and the F1-measure decreased by 2.46% (94.79%). The training set
                     ② contained only synthesized audio, lacking training on real piano recordings. Therefore,
                     it led to insufficient algorithm training and poor performance on the test set.
                  
                  
                        Fig. 2. Comparison of the automatic transcription results for piano audio.
 
                  
                        Table 3. Influence of Different Input Features on the Detection of Note Start Point.
                     
                           
                              
                                 | 
                                    
                                 									
                                  Input feature 
                                 								
                               | 
                              
                                    
                                 									
                                  Algorithm 
                                 								
                               | 
                              
                                    
                                 									
                                  P value/% 
                                 								
                               | 
                              
                                    
                                 									
                                  R value/% 
                                 								
                               | 
                              
                                    
                                 									
                                  F1-measure/% 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  STFT 
                                 								
                               | 
                              
                                    
                                 									
                                  CNN 
                                 								
                               | 
                              
                                    
                                 									
                                  75.12 
                                 								
                               | 
                              
                                    
                                 									
                                  57.64 
                                 								
                               | 
                              
                                    
                                 									
                                  65.23 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-GRU 
                                 								
                               | 
                              
                                    
                                 									
                                  76.77 
                                 								
                               | 
                              
                                    
                                 									
                                  59.89 
                                 								
                               | 
                              
                                    
                                 									
                                  67.29 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-BiGRU 
                                 								
                               | 
                              
                                    
                                 									
                                  77.79 
                                 								
                               | 
                              
                                    
                                 									
                                  60.12 
                                 								
                               | 
                              
                                    
                                 									
                                  67.82 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CQT 
                                 								
                               | 
                              
                                    
                                 									
                                  CNN 
                                 								
                               | 
                              
                                    
                                 									
                                  75.61 
                                 								
                               | 
                              
                                    
                                 									
                                  71.27 
                                 								
                               | 
                              
                                    
                                 									
                                  73.38 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-GRU 
                                 								
                               | 
                              
                                    
                                 									
                                  77.49 
                                 								
                               | 
                              
                                    
                                 									
                                  74.15 
                                 								
                               | 
                              
                                    
                                 									
                                  75.78 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-BiGRU 
                                 								
                               | 
                              
                                    
                                 									
                                  79.12 
                                 								
                               | 
                              
                                    
                                 									
                                  76.51 
                                 								
                               | 
                              
                                    
                                 									
                                  77.79 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  VQT 
                                 								
                               | 
                              
                                    
                                 									
                                  CNN 
                                 								
                               | 
                              
                                    
                                 									
                                  76.25 
                                 								
                               | 
                              
                                    
                                 									
                                  83.55 
                                 								
                               | 
                              
                                    
                                 									
                                  79.73 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-GRU 
                                 								
                               | 
                              
                                    
                                 									
                                  79.33 
                                 								
                               | 
                              
                                    
                                 									
                                  85.16 
                                 								
                               | 
                              
                                    
                                 									
                                  82.14 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-BiGRU 
                                 								
                               | 
                              
                                    
                                 									
                                  81.26 
                                 								
                               | 
                              
                                    
                                 									
                                  87.64 
                                 								
                               | 
                              
                                    
                                 									
                                  84.33 
                                 								
                               | 
                           
                        
                     
                   
                  
                        Table 4. Impact of Different Features as Input on the Effectiveness of Polyphonic Fundamental Tone Detection.
                     
                           
                              
                                 | 
                                    
                                 									
                                  Input feature 
                                 								
                               | 
                              
                                    
                                 									
                                  Algorithm 
                                 								
                               | 
                              
                                    
                                 									
                                  P value/% 
                                 								
                               | 
                              
                                    
                                 									
                                  R value/% 
                                 								
                               | 
                              
                                    
                                 									
                                  F1-measure/% 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  STFT 
                                 								
                               | 
                              
                                    
                                 									
                                  CNN 
                                 								
                               | 
                              
                                    
                                 									
                                  81.67 
                                 								
                               | 
                              
                                    
                                 									
                                  80.12 
                                 								
                               | 
                              
                                    
                                 									
                                  80.89 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-GRU 
                                 								
                               | 
                              
                                    
                                 									
                                  85.32 
                                 								
                               | 
                              
                                    
                                 									
                                  82.07 
                                 								
                               | 
                              
                                    
                                 									
                                  83.66 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-BiGRU 
                                 								
                               | 
                              
                                    
                                 									
                                  89.86 
                                 								
                               | 
                              
                                    
                                 									
                                  88.64 
                                 								
                               | 
                              
                                    
                                 									
                                  89.25 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CQT 
                                 								
                               | 
                              
                                    
                                 									
                                  CNN 
                                 								
                               | 
                              
                                    
                                 									
                                  88.77 
                                 								
                               | 
                              
                                    
                                 									
                                  87.64 
                                 								
                               | 
                              
                                    
                                 									
                                  88.20 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-GRU 
                                 								
                               | 
                              
                                    
                                 									
                                  91.67 
                                 								
                               | 
                              
                                    
                                 									
                                  92.13 
                                 								
                               | 
                              
                                    
                                 									
                                  91.90 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-BiGRU 
                                 								
                               | 
                              
                                    
                                 									
                                  95.64 
                                 								
                               | 
                              
                                    
                                 									
                                  96.12 
                                 								
                               | 
                              
                                    
                                 									
                                  95.88 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  VQT 
                                 								
                               | 
                              
                                    
                                 									
                                  CNN 
                                 								
                               | 
                              
                                    
                                 									
                                  90.07 
                                 								
                               | 
                              
                                    
                                 									
                                  91.22 
                                 								
                               | 
                              
                                    
                                 									
                                  90.64 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-GRU 
                                 								
                               | 
                              
                                    
                                 									
                                  93.21 
                                 								
                               | 
                              
                                    
                                 									
                                  91.36 
                                 								
                               | 
                              
                                    
                                 									
                                  92.28 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  CNN-BiGRU 
                                 								
                               | 
                              
                                    
                                 									
                                  97.16 
                                 								
                               | 
                              
                                    
                                 									
                                  97.34 
                                 								
                               | 
                              
                                    
                                 									
                                  97.25 
                                 								
                               |