Mobile QR Code QR CODE

  1. (Jingdezhen University, Jingdezhen 333000, China 26019@Jdzu.edu.cn )



Big data mining, Machine learning, Piano music, Hidden Markov

1. Introduction

Since the birth of the piano, the development of piano music has experienced the Baroque period, the Viennese classicism period, the Romantic period, and the modern music period since the 20th century [1]. With the progress of information and multimedia technology, music digitization has been widely adopted in various media, such as radio broadcasting, digital storage, etc. Efficiently retrieving and managing music for users that are interested in in a large amount of music has become a focus of research and development in recent years [2]. If people want to obtain music resources from the Internet, they generally need to search for music first. The traditional music search method is mainly based on text. The system queries and retrieves songs according to the keywords entered by the user according to the appropriate text search technology [3].

With the progress of information technology, content-based music retrieval technology has gradually entered people's lives. This retrieval method mainly focuses on the retrieval of the rhythm, style, emotion, and other characteristics of the music [4]. The style of music represents regional and cultural differences of various music, and this difference is reflected in the difference of musical elements such as rhythm and melody [5]. To realize the accurate classification of piano music style, this study used a deep belief network (DBN) and improved HMM (HMM). A model for classification of piano music style was constructed based on big data mining and machine learning. The model has positive significance for the development of piano music.

A fusion feature that conforms to the nature of music was extracted. Feature vectors that fully represent timbre, rhythm, and emotion were extracted from music signals. The designed network model was used to train on feature vectors, map them to a new feature space, and calculate the partition probability of each type of music. Finally, music was tested and divided into categories with the highest corresponding probability, and the testing accuracy was calculated. The Meldor frequency coefficient (MFC) was used for feature extraction, and then it was fused with music emotion features as an input feature vector of the classification system to fully represent music information. Then, a new network classification model was obtained by combining the HMM with a DBN, which has higher speed and classification accuracy. Finally, the model was trained using feature data to obtain the final music-style classification model.

2. Related Works

Music classification is an important subject and a key research direction in music signal retrieval. Combining 1-D convolution and a bidirectional recurrent Neural Network (NN), Zhang designed a new method to solve the low precision in traditional methods based on music extraction and a Deep Convolutional Neural Network (DCNN) [6]. Li combined a Backpropagation Neural Network (BPNN) with the global optimization features of the particle swarm optimization algorithm to construct a rating prediction model in user and item feature attributes and generated personalized portraits by relying on a user's preferences for music [7].

Solanki et al. used a DCNN with fixed-length music using marked main instruments and performed recognition and classification based on features such as instrument timbre [8]. In order to achieve an accurate assessment of musical difficulty, Ghatas et al. converted a MIDI notation music file played by a piano into a piano roll representation. Then, multiple convolutional neural network models were trained on parts with corresponding difficulty labels, and the models were applied to music classification [9]. Ziemer et al. used the third frequency band Volume Unit (VU), Root Mean Square (RMS) and crest factor meter, phase range, and channel correlation coefficient of the music as feature vectors and then classified electronic dance music to obtain a list of recommended music for users [10].

Big data mining is one of the application modes of big data. By analyzing source data and databases, information is screened to use it in applications. Chamikaraet al. developed an effective and scalable irreversible perturbation algorithm called PABIDOT to protect potentially sensitive data on the Internet from being disclosed. By changing the optimal set, this method can better protect the privacy of big data [11]. Jayasri et al. used an innovative hierarchical decision-making attention network, association rules, and a mixture of multi-class outlier classification under the MapReduce framework to assess the medical data of diabetic patients and constructed a new medical data evaluation model for diabetes [12]. Liu et al. analyzed tourist-generated big data by using a mixed approach and a combination of generalized additive models and segmental regression models. Climate change’s influence on hiking in 100 cities in China was quantitatively measured through the obtained data, and some suggestions were given for the development of urban tourism, which provided new development ideas for the tourism industry [13].

Ageed et al. discuss how big data can be adopted for a more innovative society in view of the current situation where different technologies are used in smart cities to make residents' lives more convenient. They mainly compared and contrasted different smart-city and big-data concepts [14]. Ji used a clustering algorithm to find the center point of malicious intrusion data and then used the Markov distance to determine the similarity of malicious intrusion data. A big data mining algorithm was proposed for malicious intrusions in wireless communication technology based on legitimate big data [15].

Machine learning has been broadly used in different fields in recent years. Keith et al. showed how computational chemistry and machine learning can be applied together in molecular and material modeling, reverse transcription synthesis, catalysis, and drug design to provide new insights for researchers [16]. Nabipour et al. converted indicators into binary data and reduced the risk of trend prediction in the stock market through machine learning and deep learning algorithms [17].

To solve the problem of experiments and calculations consuming much time and resources in the design of traditional crystal growth systems, Yu et al. combined machine learning and genetic algorithms to accelerate the geometric optimization process [18]. In order to help medical experts use some biomedical images captured on patients from intelligent systems, a classification workflow relying on optimal algorithms, support vector machines, and deep learning was proposed, and propose classification methods suitable for medical imaging [19]. To help practitioners determine the best integration technology according to their own situation, Gonz{\'{a}}lez et al. modified the bagging and boosting algorithms and conducted a review of online available software tools according to the version and function of the implementation [20]. To establish the connection between the data envelopment analysis (DEA) method and a machine learning algorithm, Zhu et al. proposed an alternative method combining DEA with machine algorithms to predict the DEA efficiency of decision-making units [21].

3. Construction of Piano Music Style Classification Model based on Improved DBN-HMM

3.1 Music Signal Preprocessing and Feature Parameter Extraction

With the progress of information and multimedia technology, music digitization has been widely adopted in various media, such as radio broadcasting, digital storage, etc. The music signal has a characteristic in that various performance indicators remain unchanged in the short run (that is, short-term stability). Therefore, when studying the overall characteristics of the signal, it is necessary to focus on the characteristics of each segment. First, the music signal needs to be processed in frames. In order to make a smooth transition between two frames, it is necessary to ensure that there is a sample overlap of 1/3 to 1/2 of the frame length between the two frames during the framing process. The framing process is shown in the figure.

The number of frames that a music signal can be divided into is shown in Eq. (1).

(1)
$ N_{frames}=\left| \frac{N_{x}-N_{0}}{N_{f}-N_{0}}\right| $

In Eq. (1), $N_{x}$is the total length of the music signal, $N_{0}$is the overlapping length between frames, and ${b_{i}}^{l}$ is the length of one frame. The high-frequency part of the music signal with low energy is enhanced by filtering. Eq. (2) shows the implementation method.

(2)
$ h_{W,b}\left(x^{\left(i\right)}\right) $

In Eq. (2), $y\left(n\right)$is the output signal after signal enhancement processing, $x\left(n\right)$is the input signal, and $\mu $ is an enhancement factor with a value close to 1.

After the frame processing, to increase the continuity between the frames, reduce the edge effect, and reduce the leakage of the spectrum, the window processing of the signal was studied. The process is shown in Eq. (3).

(3)
$ s_{w}\left(n\right)=y\left(n\right)\times w\left(n\right)\begin{array}{cc} 0\leq n\leq N_{f}-1\end{array} $

In Eq. (3), $s_{w}\left(n\right)$is the signal after windowing, and $w\left(n\right)$is the window function. This research used the Hamming window as the windowing function, as shown in Eq. (4).

(4)
$ w\left(n\right)=0.54-0.46\cos \left(\frac{2\pi n}{N_{f}-1}\right)\begin{array}{cc} & 0\leq n\leq N_{f}-1\end{array} $

The selection of music features determines the performance of the recognition system to some extent. Good speech features can improve the accuracy and speed of music signal classification. People's perception of music sound quality is mainly measured by pitch, timbre, and rhythm. They can all be abstracted into feature vectors to represent them. Most of the abstract features of timbre are short-term features, while the abstract features of pitch and rhythm are mostly long-term features. In the view of the transform domain, short-term features are divided into time domain (TD) features, frequency domain (FD) features, and cepstrum domain features.

The Frequency Domain Features (FDF) of music is the characteristic parameters obtained by processing the signal in the FD features after applying the Fourier transform to the music signal first. Common FD features include the spectral centroid, spectral energy, spectral bandwidth, spectral sub-band energy, spectral flow, and spectral sub-band flow. The spectral centroid is used to measure the center of the spectrum. The larger the value, the more high-frequency components of the signal there are. It is calculated with Eq. (5).

(5)
$ SC=\frac{{\sum }_{\omega =l}^{h_{0}}\omega \left| F\left(\omega \right)\right| ^{2}}{{\sum }_{\omega =l_{0}}^{h_{0}}\omega \left| F\left(\omega \right)\right| ^{2}} $

In Eq. (5), $F\left(\omega \right)$is the Fourier transform of a frame signal, and $l$ and $h$represent the minimum and maximum values of frequencies in the sub-bands, respectively. The spectrum energy is found with Eq. (6).

(6)
$ SE=\sqrt{\frac{1}{h_{0}-l_{0}}{\sum }_{\omega =l_{0}}^{h_{0}}\left| F\left(\omega \right)\right| ^{2}} $

The spectral bandwidth is weighted by the distance between the spectral energy and the spectral centroid and mainly measures the FDF range of music, as shown in Eq. (7).

(7)
$ SC=\frac{{\sum }_{\omega =l_{0}}^{h_{0}}\left(\omega -SC\right)^{2}\left| F\left(\omega \right)\right| ^{2}}{{\sum }_{\omega =l_{0}}^{h_{0}}\left| F\left(\omega \right)\right| ^{2}} $

The spectrum flow reflects the dynamic characteristics of the spectrum, which is the sum of the squares of the corresponding points in the FD of two adjacent frames, which reflects the sum of the spectrum transformation. The calculation is done with Eq. (8).

(8)
$ SF=\frac{1}{h_{0}-l_{0}}{\sum }_{\omega =l_{0}}^{h_{0}}\left| F\left(\omega +1\right)-F\left(\omega \right)\right| $

The mel-frequency cepstral coefficients (MFCCs) are cepstrum parameters extracted in the mel-scale FD. Eq. (9) shows the relationship of mel frequency and linear frequency

(9)
$f_{mel}=1127\times \ln \left(1+\frac{f}{700}\right)$

In Eq. (9), $f$is the linear frequency. After framing, signal enhancement, and windowing, the mel-frequency coefficient (MPC) of the music signal was extracted. Based on an MFCC, an MPC does not require a discrete cosine transform, and the result is directly output after logarithmic calculation. The extraction of MPC is shown in Fig. 2.

The frequency spectrum was decomposed into several sub-bands by using a mel bandpass filter to obtain the parameter frequency. The natural logarithm of the frequency was then calculated to obtain the MPC parameter. For better unsupervised learning, the restricted Boltzmann machine (RBM) was introduced based on the MPC parameters, and the maximum likelihood function was used to optimize the selected feature parameters. The MPC feature vector is the input layer of RBM. The network parameters are continuously updated after training. The update method is shown in Eq. (10).

(10)
$\begin{align*} \begin{cases} \Delta w_{ij}=\Delta w_{ij}+\left[p\left(h_{i}=1\left| s^{\left(0\right)}\right.\right)x^{\left(0\right)}-p\left(h_{i}=1| s^{\left(k\right)}{x_{j}}^{\left(k\right)}\right)\right]\\ \Delta a_{j}=\Delta a_{j}+\left[{x_{i}}^{\left(0\right)}-{x_{j}}^{\left(k\right)}\right]\\ \Delta b_{j}=\Delta b_{j}+p\left(h_{i}=1\left| s^{\left(0\right)}\right.\right)-p\left(h_{i}=1\left| s^{\left(k\right)}\right.\right) \end{cases} \end{align*}$

In Eq. (10), $\Delta w$is a weight matrix in the visible layer and hidden layer, $\Delta a$and$\Delta b$are the bias vectors of the visible layer and the hidden layer, respectively, and $p\left(h\left| s\right.\right)$ is the corresponding hidden layer probability distribution when the visible unit is a specific training sample $s$.

Emotion is an attribute of music, and emotional characteristics are an important feature in music signals. Vectors related to emotions can be divided into time-domain feature vectors and frequency-domain feature vectors. The TD and FD characteristic vectors include the mean, variance, intensity, maximum, center of gravity, bandwidth, roll, and flow. In order to judge the impact of features on the music emotions, this study adopted the sequential floating forward selection method to obtain the features that affect the emotional model. Due to the subjectivity and the singularity of emotional features, the classification effect is not ideal when used alone in classifiers. For this reason, mixed feature vectors were used in this study. The feature vector of the frame t can be expressed as Eq. (11).

(11)
$\begin{align*} \begin{cases} {X_{t}}^{MPC}=\left[MPC_{t}\left(0\right),MPC_{t}\left(1\right),\ldots ,MPC_{t}\left(M-1\right)\right]\\ {M_{t}}^{MOOD}=\left[TA,TV,TC,TB,TR,TF,FA,FV,FC,FB,FR,FF\right] \end{cases} \end{align*}$

In Eq. (11), $M$is the order of the MPC feature vector.

The emotion feature vector adopts 12 TD and FD features. The two types of characteristics are fused to obtain the final combination of music signal features. The feature fusion process is shown in Fig. 3. Based on these operations, the preprocesses of framing, signal enhancement, and windowing on the input music signal were studied. The feature combination of an MPC feature and emotional feature fusion was extracted, which provided a basis for the classification model construction.

Fig. 1. Diagram of overlapping frame division processing.
../../Resources/ieie/IEIESPC.2024.13.2.129/fig1.png
Fig. 2. MPC feature vector extraction process.
../../Resources/ieie/IEIESPC.2024.13.2.129/fig2.png
Fig. 3. MPC features and emotional features are fused into a combined feature flow.
../../Resources/ieie/IEIESPC.2024.13.2.129/fig3.png

3.2 Music Classification based on DBN-HMM

Music classification plays a very important role in a music signal retrieval. Users who like to listen to piano music are not interested in all styles of music used. In order to carry out music retrieval according to their interests and hobbies, they were classified according to the fusion characteristics of piano music, which is convenient for efficient management and fast retrieval of music. HMM contains two stochastic processes, the observed state and the hidden state, and uses parameters to represent the statistical properties of the stochastic process. HMM has a property of essentially reflecting sound. It is widely used in speech signal processing, which can be expressed as a quintuple in Eq. (12).

(12)
$ \left(\Omega _{x},\Omega _{O},A,B,\pi \right) $

In Eq. (12), $\Omega _{x}$is the state set, $\Omega _{O}$is the observation value set, and $A$is a transition probability matrix indicating the probability of transitioning $B$from the state at the current moment $q_{i}$to the state at the next moment. $q_{j}$is an output probability matrix, indicating $q_{i}$ is the observed value’s probability when the state value is $o_{M}$. $\pi $is the initial state distribution.

Fig. 4. Diagram of DBN network structure.
../../Resources/ieie/IEIESPC.2024.13.2.129/fig4.png

The model obtains the relevant parameters of the HMM by calculating the probability of the observation sequence and the probability of each state and realizes the classification of the sample data. However, the HMM depends on the state labels of the training data, and there are a large number of raw data with missing labels in actual training, which affects the recognition effect. To avoid this defect, HMM and DBN were combined to classify music signals. Fig. 4 shows the specific process of the DBN.

The network structure of DBN is a highly complex directed acyclic graph composed of RBM and is a hierarchical unsupervised learning model. DBN can effectively utilize data with missing labels, its deep network structure enhances the ability to model signal features, and it can provide more accurate observation probabilities. The joint probability distribution is used to represent the relationship in the visible layer and the hidden layer. Eq. (13) displays the calculation method.

(13)
$ \begin{array}{l} P\left(v,h^{1},h^{2},\ldots h^{l}\right)=P\left(v| h^{1}\right)P\left(h^{1}| h^{2}\right)\ldots P\left(h^{l-2}| h^{l-1}\right)\\ P\left(h^{l-1}| h^{l}\right) \end{array}$

In Eq. (13), $l$is the number of hidden layers of the DBN. RBM can obtain better initial parameter values through layer-by-layer training. The network was further enhanced through traditional learning algorithms. The batch gradient descent method was used for network tuning, and the overall loss function of the sample is shown in Eq. (14).

(14)
J W , b = 1 n i = 1 n J W , b ; x i , y i + λ 2 i = 1 m i 1 i = 1 s i j = 1 s i W i j l 2 = 1 n i = 1 n 1 2 | | h W , b x i x i | | + λ 2 i = 1 m i 1 i = 1 s i j = 1 s i W i j l 2

In Eq. (14), ${W_{ij}}^{\left(l\right)}$ is the connection weight coefficient; $l$ is the number of hidden layers; $i$ and $j$ represent the number of nodes in the current and subsequent hidden layers, respectively; $h_{W,b}\left(x^{\left(i\right)}\right)$ is the offset of the node; ${b_{i}}^{l}$ is the result after reconstruction. The partial derivative of the weight coefficient and bias is shown in Eq. (15).

(15)
$\begin{align} \begin{cases} \frac{\partial J\left(W,b\right)}{\partial {W_{ij}}^{\left(l\right)}}=\left[\frac{1}{n}{\sum }_{i=1}^{n}\frac{\partial J\left(W,b;x^{\left(i\right)},y^{\left(i\right)}\right)}{\partial {W_{ij}}^{\left(l\right)}}\right]+\lambda {W_{ij}}^{\left(l\right)}\\ \frac{\partial J\left(W,b\right)}{\partial {b_{i}}^{\left(l\right)}}=\frac{1}{n}{\sum }_{i=1}^{n}\frac{\partial J\left(W,b;x^{\left(i\right)},y^{\left(i\right)}\right)}{\partial {b_{i}}^{\left(l\right)}} \end{cases} \end{align} $
Fig. 5. Basic structure of DBN+HMM model.
../../Resources/ieie/IEIESPC.2024.13.2.129/fig5.png

The DBN-HMM model was trained and classified by estimating the posterior probability of HMM, and its basic structure is shown in Fig. 5. To improve the loss function, the gradient descent method was used to minimize the reconstruction mean square error. The objective function takes the cross-entropy between the reference state label and the predicted state distribution, as in Eq. (16).

(16)
$ F_{CE}=-{\sum }_{u=1}^{U}{\sum }_{n=1}^{N}\log y\left(s\right) $

In Eq. (16), $s$ is the current state, and $y\left(s\right)$is the predicted state distribution. The output results in the DBN output layer node are the input of HMM, and the posterior probability of the HMM state is calculated using the Softmax regression model. The output state distribution is expressed as Eq. (17).

(17)
y n s = P s | O n = exp a n s s exp a n s

In Eq. (17), $P\left(s\right)$is the prior probability of the state appearing in training data $s$. The gradient expression between the objective function and the activation probability is shown in Eq. (18).

(18)
$ \frac{\partial F_{CE}}{\partial a_{n}\left(s\right)}=-\frac{\partial \log y_{n}\left(s\right)}{\partial a_{n}\left(s\right)}=y_{n}\left(s\right)-\delta _{s;{s_{n}}} $

In Eq. (18), $F_{CE}$ is the objective function, $a_{n}\left(s\right)$is the activation probability, $\delta _{s;{s_{n}}}$is the Ronecker function, which satisfies Eq. (19).

(19)
$ \delta _{s;{s_{n}}}=\begin{cases} 1\begin{array}{cc} & s=s_{n} \end{array}\\ 0\begin{array}{cc} & s\neq s_{n} \end{array} \end{cases} $

It was found that the recognition effect of the model still needs to be further improved. During the HMM training, the algorithm produces large differences in the randomly selected initial matrix parameters, making the result trapped at a local optimum and affecting the accuracy of model classification and recognition. To optimize the parameters, the global search advantages of a genetic algorithm (GA) were used to optimize the initial parameters of the matrix of the HMM. This was done to deal with the sensitive problem caused by the random selection of the initial parameters of the Baum-Welch training algorithm.

However, the traditional GA easily produces ``super individuals'' in the evolution process, which affect the subsequent evolution. For this reason, the mutation operator was improved, and the chaos operator was used to carry out the mutation operation to improve the quality of the evolution result. Mutation contributes to the generation of new individuals, and the mapping model obtained through the chaotic mapping operator is shown in Eq. (20).

(20)
$ \begin{cases} x_{n+1}=kx_{n}-\begin{array}{cc} g\left(x\right) & 0\leq k\leq 1 \end{array}\\ g\left(x\right)=2\tan \mathrm{h}\left(ax_{n}\right)\exp \left(-3{x_{n}}^{2}\right) \end{cases} $

In Eq. (20), $x$ is the neuron’s internal state, $k$is a damping factor of the neuromembrane, $g\left(x\right)$ is nonlinear self-feedback. Based on Gaussian variation, the function of a Gaussian normal distribution was changed to a chaotic mapping function. The improved variation operator is shown in Eq. (21).

(21)
$ {C_{i}}^{'}=C_{i}+sg\left(x_{i}\right) $

In Eq. (21), $s$is the mutation scale, $g$ is the quoted annealing factor. Based on these operations, the GA was used to perform the parameter initialization in HMM, and then the DBN was used to classify the style of a piano music signal. Fig. 6 displays the classification flow.

Fig. 6. Classification process of piano-music style-classification model based on DBN-GA-HMM.
../../Resources/ieie/IEIESPC.2024.13.2.129/fig6.png

4. Performance Analysis of Music-style Classification Model based on Improved DBN-HMM

To test the rationality of the selection of music-signal feature vectors, the one-dimensional MPCs of 10 different music signals were compared, as shown in Fig. 7. The MPC curve is not smooth, and there are many mutations, which can fully reflect the music attributes. Moreover, the magnitude and change direction of the MPC features of the 10 types of music signals are completely different. It can be seen that the MPC features can better characterize and distinguish the 10 types.

To further prove the rationality of merging music signal features into combined features in the model, the precision (P), recall (R), and comprehensive evaluation indicators (F0.5) of a single feature type and a combination of feature types were compared. Table 1 displays the comparison results. The P value obtained by MPC+MOOD combination features was 95.40%, the F0.5value was 94.45%, and the R value was 92.23%. The P value of MFCC+MOOD was 87.68%, which is 7.72% lower than that of MPC+MOOD. The F0.5value was 85.35%, which is 9.10% lower than that of MPC+MOOD, and the R value was 82.32%, which is 9.91% lower than that of MPC+MOOD. The precision rate, recall rate, and F0.5 value obtained by the other three individual features do not exceed 80%. The combination feature type can obtain better classification results in model classification, and the classification effect of the MPC+MOOD feature value is better than that of the MFCC+MOOD combination feature.

To verify the improvement effect of the classification method, three types of models before and after optimization (HMM, DBN+HMM, and DBN+GA+HMM) were iteratively trained, and the training situation is shown in Fig. 8. The average classification accuracy gradually increases with the number of iterations, the loss value of the model gradually decreases. The DBN+GA+HMM model iterates 208 times to reach the target loss value, and the DBN+HMM model iterates 350 times to reach the target value, which is 142 more iterations than DBN+GA+HMM. The HMM model reaches the target value in 451 iterations, which is 443 more iterations than DBN+GA+HMM. During the iteration process, the classification accuracy curve of the DBN+GA+HMM model was always above the curves of the other two models, and the accuracy reached 80% when the iterations reached 40. The improved model had better convergence than the other two models, and the quality of the classification results was higher.

Table 1. Comparison of classification effects between fused features and single features.

Feature vector

Dataset 1

Dataset 2

P (%)

F 0.5 (%)

R (%)

P (%)

F 0.5 (%)

R (%)

MPC

75.01

71.58

69.32

74.98

71.22

69.02

MOOD

76.48

73.58

71.36

76.36

73.17

71.28

MFCC

75.48

72.39

70.26

75.69

72.45

70.49

MFCC+MOOD

87.47

85.46

82.17

87.88

85.24

82.46

MPC+MOOD

96.47

94.56

92.32

96.78

94.33

92.14

Fig. 7. Comparison of one-dimensional MPCs of 10 different types of music signals.
../../Resources/ieie/IEIESPC.2024.13.2.129/fig7.png
Fig. 8. Iterative training of model before and after optimization.
../../Resources/ieie/IEIESPC.2024.13.2.129/fig8.png

To further verify the classification performance of the model, the proposed model (model 1) and four popular classification models were compared: a classification model based on an intelligent swarm optimization neural network (model 2), a classification model based on StarGAN (model 3), a classification model based on AdaBoost and ELM (model 4), and a classification model based on a BPNN (model 5). Eight types of music were classified separately and compared with the actual results, as shown in Fig. 9.

The output result curves of each music genre are different. The change curve of model 1 is basically consistent with the actual result curve, and the fitting degree reached 0.96. The curve change of model 2 was basically consistent with the actual curve, but there were some discrepancies, and the fitting degree was 0.83, which is 0.13 lower than that of model 1. The fitting level of model 3 was 0.79, which is 0.17 lower than model 1. The fitting of model 4 was 0.74, which is 0.2 lower than model 1. The fit of model 5 was 0.70, which is 0.26 lower than model 1. In Fig. 9, model 1 has the highest classification and recognition accuracy.

To prove the stable performance of the model, the scale of sample data was continuously increased during the experiment, and five models were used to classify the data. Changes in classification accuracy of the model is shown in Fig. 10. With the number of samples added, the classification accuracy of each model decreases. The curve change of model 1 was the most stable. When the number of samples was 50, the classification accuracy rate was 98.78%. When the number of samples reached 600, the accuracy rate was 96.89%, which only dropped by 1.89%, the accuracy was 95.43%. When the number of samples increased to 600, the accuracy was 79.32%, a decrease of 16.10%.

The accuracy of model 3 decreased from 91.23% to 76.54%, and the accuracy of model 4 decreased from 96.47% to 71.59%. Model 5’s accuracy dropped from 96.81% to 65.75%. Comprehensive analysis of the content in Fig. 10 shows that model 1 had higher stability. The classification accuracy was less affected by the increase in the number of samples.

PRD was introduced as an evaluation index of the model classification prediction. In the experiment, 5 models were used to classify the music signals of two datasets, and the relevant indicators are shown in Table 2. The average PRD value of model 1 was 2.402, the running time was 2.117 s, and the accuracy was 97.074%. The average PRD value of model 2 was 1.696, which was 0.706 lower than model 1, and the running time was 3.457 s, which was more than model 1 (1.340 s). The accuracy was 90.535%, which was 6.539% lower than model 1.

The average PRD value of model 3 was 1.613, which was 0.789 lower than model 1. The running time was 3.843s, which was 1.725 s longer than model 1. The accuracy was 85.360%, which is 11.714% lower than model 1. The average PRD value of model 4 was 1.407, which was 0.995 lower than model 1, and the running time was 4.030 s, which was 1.913 s longer than model 1. The accuracy was 84.173%, which was 12.901% lower than model 1. The average PRD value of model 5 was 1.383, which was 1.017 lower than model 1, and the running time was 4.331 s, which was 2.214 s longer than model 1. The accuracy was 80.219%, which was 16.855% lower than

Fig. 9. Classification result curve of each model for four music types.
../../Resources/ieie/IEIESPC.2024.13.2.129/fig9.png
Fig. 10. Change curve of model accuracy with the increase of sample number.
../../Resources/ieie/IEIESPC.2024.13.2.129/fig10.png
Table 2. Comparison of evaluation index data of classification model.

Project

Dataset 1

Dataset 2

Run time/s

Accuracy (%)

PRD

Run time/s

Accuracy (%)

PRD

Model 1

2.269

97.323

2.378

1.965

96.824

2.426

Model 2

3.468

90.487

1.697

3.445

90.582

1.695

Model 3

3.846

85.484

1.623

3.839

85.236

1.602

Model 4

4.021

84.222

1.401

4.039

84.123

1.412

Model 5

4.365

80.176

1.398

4.298

80.261

1.367

model 1. A comprehensive analysis of the contents of Table 2 shows that model 1 has the shortest running time, the highest PRD value, and the highest classification accuracy, and it can efficiently and accurately classify piano music styles.

5. Conclusion

The style of music represents regional and cultural differences, and this difference is reflected in the difference of musical elements such as rhythm and melody. In order to realize the accurate classification of piano music style, this study used a DBN and HMM. During the HMM training process, the algorithm had a large difference in the randomly selected initial matrix parameters, so that the result was trapped at a local optimum, which affected the accuracy of model classification and recognition. In order to optimize the parameters, a GA was used to optimize the classification model.

The experimental analysis showed that the P value obtained by MPC+MOOD combined features was 95.40%, the F0.5value was 94.45%, and the R value was 92.23%. The combined feature type can obtain better classification results. The change curve of model 1 basically coincided with the actual result curve, and the fitting degree reached 0.96. The average PRD value of model 1 was 2.402, the running time was 2.117 s, and the accuracy was 97.074%.

In future research, more features that characterize music attributes can be studied. The HPSS algorithm could be used to separate the original music signal spectrum into time feature harmonic components and frequency feature impulse components and combined with the original spectrum as input to the model. The MFCCs of different types of music signals could also be used as feature values to further improve the model performance. Research data can also be selected from a wider range of music genres to expand the application scope of the model.

REFERENCES

1 
Shi N, Wang Y. Symmetry in computer-aided music composition system with social network analysis and artificial neural network methods. Journal of Ambient Intelligence and Humanized Computing, 2020: 1-16.DOI
2 
Cífka O, Şimşekli U, Richard G. Groove2Groove: one-shot music style transfer with supervision from synthetic data. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 2638-2650.DOI
3 
Kempfert K C, Wong S W K. Where does Haydn end and Mozart begin? Composer classification of string quartets. Journal of New Music Research, 2020, 49(5): 457-476.DOI
4 
Ramírez J, Flores M J. Machine learning for music genre: multifaceted review and experimentation with audioset. Journal of Intelligent Information Systems, 2020, 55(3): 469-499.DOI
5 
Scott C D. Policing Black sound: performing UK Grime and Rap music under routinised surveillance. Soundings, 2020, 75(75): 55-65.DOI
6 
Zhang K. Music style classification algorithm based on music feature extraction and deep neural network. Wireless Communications and Mobile Computing, 2021, 2021: 1-7.DOI
7 
Li T. Visual classification of music style transfer based on PSO-BP rating prediction model. Complexity, 2021, 2021: 1-9.DOI
8 
Solanki A, Pandey S. Music instrument recognition using deep convolutional neural networks. International Journal of Information Technology, 2022, 14(3): 1659-1668.DOI
9 
Ghatas Y, Fayek M, Hadhoud M. A hybrid deep learning approach for musical difficulty estimation of piano symbolic music. Alexandria Engineering Journal, 2022, 61(12): 10183-10196.DOI
10 
Ziemer T, Kiattipadungkul P, Karuchit T. Music recommendation based on acoustic features from the recording studio. The Journal of the Acoustical Society of America, 2020, 148(4): 2701-2701.DOI
11 
Chamikara M A P, Bertok P, Liu D, Camtepe S, Khalii I. Efficient privacy preservation of big data for accurate data mining. Information Sciences, 2020, 527: 420-443.DOI
12 
Jayasri N P, Aruna R. Big data analytics in health care by data mining and classification techniques. ICT Express, 2022, 8(2): 250-257.DOI
13 
Liu J, Yang L, Zhou H, Wang S. Impact of climate change on hiking: quantitative evidence through big data mining. Current issues in tourism, 2021, 24(21): 3040-3056.DOI
14 
Ageed Z S, Zeebaree S R M, Sadeeq M M. A survey of data mining implementation in smart city applications. Qubahan Academic Journal, 2021, 1(2): 91-99.DOI
15 
Ji K. Malicious Intrusion Data Mining Algorithm of Wireless Personal Communication Network Supported by Legal Big Data. Wireless Communications and Mobile Computing, 2021, 2021: 1-7.DOI
16 
Keith J A, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller K R, Tkatchenko A. Combining machine learning and computational chemistry for predictive insights into chemical systems. Chemical reviews, 2021, 121(16): 9816-9872.DOI
17 
Nabipour M, Nayyeri P, Jabani H, Shahab S, Mosavi A. Predicting stock market trends using machine learning and deep learning algorithms via continuous and binary data; a comparative analysis. IEEE Access, 2020, 8: 150199-150212.DOI
18 
Yu W, Zhu C, Tsunooka Y, Huang W, Dang Y, Kutsukake K, Harada S, Tagawa M, Ujihara T. Geometrical design of a crystal growth system guided by a machine learning algorithm. CrystEngComm, 2021, 23(14): 2695-2702.DOI
19 
Tchito Tchapga C, Mih T A, Tchagna Kouanou A, Fonzin T, Fogang P K, Mezatio B, Tchiotsop D. Biomedical image classification in a big data architecture using machine learning algorithms. Journal of Healthcare Engineering, 2021, 2021: 1-11.DOI
20 
González S, García S, Del Ser J,. A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Information Fusion, 2020, 64: 205-237.DOI
21 
Zhu N, Zhu C, Emrouznejad A. A combined machine learning algorithms and DEA method for measuring and predicting the efficiency of Chinese manufacturing listed companies. Journal of Management Science and Engineering, 2021, 6(4): 435-448.DOI

Author

Wei You
../../Resources/ieie/IEIESPC.2024.13.2.129/au1.png

Wei You, graduated from Jiangxi University of Science and Technology with a master’s degree in Music Education. Holds the position of Lecturer. Throughout the teaching career, has been responsible for courses such as “Piano” and “Orff Music Teaching Method”. Main research areas include Piano and Orff. Published more than 10 research papers and has taken part in 8 research projects (such as “Research on the Inheritance of Traditional Culture in Preschool Education—Taking the Introduction of Local Instruments into Orff Curriculum Teaching as an Example” and “Music Communication of Haydn’s Piano Sonatas in Jiangxi Vocational Colleges”). Also, has authored or contributed to the editing of 7 textbooks (including “Piano” and “Ding Dang Small Glockenspiel Innovative Music Enlightenment Series Textbooks”).