Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 13, No. 02, p.129-139

ISSN (online) :

2287-5255

Received : 12 May 2023Revised : 17 July 2023Accepted : 29 July 2023

DOI :

https://doi.org/10.5573/IEIESPC.2024.13.2.129

Regular Paper

Modeling Method for Classification of Piano Music Style based on Big Data Mining and Machine Learning

YouWei¹

(Jingdezhen University, Jingdezhen 333000, China 26019@Jdzu.edu.cn )

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

With the progress of music digitalization, various styles of music have been produced, and effective classification of music has become an important research direction. In this research, a model for piano-music style classification was constructed based on big data mining and machine learning algorithms. The input music signal was dealt with using framing, signal enhancement, and windowing. The Meldor Frequency Coefficient (MFC) and emotional features in the signal were extracted and fused to obtain combined features. The extracted feature vectors were input into a Deep Belief Network (DBN) for training and then a hidden Markov model (HMM) for classification and recognition. However, it was found that during the HMM training process, the algorithm produces large differences in the randomly selected initial matrix parameters, which cause the results to be trapped at a local optimum and affect the accuracy of model classification and recognition. To optimize the parameters, a genetic algorithm was used to optimize the classification model. The average Relative Percent Difference (PRD) was 2.402, the run time was 2.117 s, and the accuracy was 97.074%, which means the model can efficiently and accurately classify piano music styles.

Keywords

Big data mining, Machine learning, Piano music, Hidden Markov

1. Introduction

Since the birth of the piano, the development of piano music has experienced the Baroque period, the Viennese classicism period, the Romantic period, and the modern music period since the 20th century ^[1]. With the progress of information and multimedia technology, music digitization has been widely adopted in various media, such as radio broadcasting, digital storage, etc. Efficiently retrieving and managing music for users that are interested in in a large amount of music has become a focus of research and development in recent years ^[2]. If people want to obtain music resources from the Internet, they generally need to search for music first. The traditional music search method is mainly based on text. The system queries and retrieves songs according to the keywords entered by the user according to the appropriate text search technology ^[3].

With the progress of information technology, content-based music retrieval technology has gradually entered people's lives. This retrieval method mainly focuses on the retrieval of the rhythm, style, emotion, and other characteristics of the music ^[4]. The style of music represents regional and cultural differences of various music, and this difference is reflected in the difference of musical elements such as rhythm and melody ^[5]. To realize the accurate classification of piano music style, this study used a deep belief network (DBN) and improved HMM (HMM). A model for classification of piano music style was constructed based on big data mining and machine learning. The model has positive significance for the development of piano music.

A fusion feature that conforms to the nature of music was extracted. Feature vectors that fully represent timbre, rhythm, and emotion were extracted from music signals. The designed network model was used to train on feature vectors, map them to a new feature space, and calculate the partition probability of each type of music. Finally, music was tested and divided into categories with the highest corresponding probability, and the testing accuracy was calculated. The Meldor frequency coefficient (MFC) was used for feature extraction, and then it was fused with music emotion features as an input feature vector of the classification system to fully represent music information. Then, a new network classification model was obtained by combining the HMM with a DBN, which has higher speed and classification accuracy. Finally, the model was trained using feature data to obtain the final music-style classification model.

2. Related Works

Music classification is an important subject and a key research direction in music signal retrieval. Combining 1-D convolution and a bidirectional recurrent Neural Network (NN), Zhang designed a new method to solve the low precision in traditional methods based on music extraction and a Deep Convolutional Neural Network (DCNN) ^[6]. Li combined a Backpropagation Neural Network (BPNN) with the global optimization features of the particle swarm optimization algorithm to construct a rating prediction model in user and item feature attributes and generated personalized portraits by relying on a user's preferences for music ^[7].

Solanki et al. used a DCNN with fixed-length music using marked main instruments and performed recognition and classification based on features such as instrument timbre ^[8]. In order to achieve an accurate assessment of musical difficulty, Ghatas et al. converted a MIDI notation music file played by a piano into a piano roll representation. Then, multiple convolutional neural network models were trained on parts with corresponding difficulty labels, and the models were applied to music classification ^[9]. Ziemer et al. used the third frequency band Volume Unit (VU), Root Mean Square (RMS) and crest factor meter, phase range, and channel correlation coefficient of the music as feature vectors and then classified electronic dance music to obtain a list of recommended music for users ^[10].

Big data mining is one of the application modes of big data. By analyzing source data and databases, information is screened to use it in applications. Chamikaraet al. developed an effective and scalable irreversible perturbation algorithm called PABIDOT to protect potentially sensitive data on the Internet from being disclosed. By changing the optimal set, this method can better protect the privacy of big data ^[11]. Jayasri et al. used an innovative hierarchical decision-making attention network, association rules, and a mixture of multi-class outlier classification under the MapReduce framework to assess the medical data of diabetic patients and constructed a new medical data evaluation model for diabetes ^[12]. Liu et al. analyzed tourist-generated big data by using a mixed approach and a combination of generalized additive models and segmental regression models. Climate change’s influence on hiking in 100 cities in China was quantitatively measured through the obtained data, and some suggestions were given for the development of urban tourism, which provided new development ideas for the tourism industry ^[13].

Ageed et al. discuss how big data can be adopted for a more innovative society in view of the current situation where different technologies are used in smart cities to make residents' lives more convenient. They mainly compared and contrasted different smart-city and big-data concepts ^[14]. Ji used a clustering algorithm to find the center point of malicious intrusion data and then used the Markov distance to determine the similarity of malicious intrusion data. A big data mining algorithm was proposed for malicious intrusions in wireless communication technology based on legitimate big data ^[15].

Machine learning has been broadly used in different fields in recent years. Keith et al. showed how computational chemistry and machine learning can be applied together in molecular and material modeling, reverse transcription synthesis, catalysis, and drug design to provide new insights for researchers ^[16]. Nabipour et al. converted indicators into binary data and reduced the risk of trend prediction in the stock market through machine learning and deep learning algorithms ^[17].

To solve the problem of experiments and calculations consuming much time and resources in the design of traditional crystal growth systems, Yu et al. combined machine learning and genetic algorithms to accelerate the geometric optimization process ^[18]. In order to help medical experts use some biomedical images captured on patients from intelligent systems, a classification workflow relying on optimal algorithms, support vector machines, and deep learning was proposed, and propose classification methods suitable for medical imaging ^[19]. To help practitioners determine the best integration technology according to their own situation, Gonz{\'{a}}lez et al. modified the bagging and boosting algorithms and conducted a review of online available software tools according to the version and function of the implementation ^[20]. To establish the connection between the data envelopment analysis (DEA) method and a machine learning algorithm, Zhu et al. proposed an alternative method combining DEA with machine algorithms to predict the DEA efficiency of decision-making units ^[21].

3. Construction of Piano Music Style Classification Model based on Improved DBN-HMM

3.1 Music Signal Preprocessing and Feature Parameter Extraction

With the progress of information and multimedia technology, music digitization has been widely adopted in various media, such as radio broadcasting, digital storage, etc. The music signal has a characteristic in that various performance indicators remain unchanged in the short run (that is, short-term stability). Therefore, when studying the overall characteristics of the signal, it is necessary to focus on the characteristics of each segment. First, the music signal needs to be processed in frames. In order to make a smooth transition between two frames, it is necessary to ensure that there is a sample overlap of 1/3 to 1/2 of the frame length between the two frames during the framing process. The framing process is shown in the figure.

The number of frames that a music signal can be divided into is shown in Eq. (1).

(1)

$ N_{frames}=\left| \frac{N_{x}-N_{0}}{N_{f}-N_{0}}\right| $

In Eq. (1), $N_{x}$is the total length of the music signal, $N_{0}$is the overlapping length between frames, and ${b_{i}}^{l}$ is the length of one frame. The high-frequency part of the music signal with low energy is enhanced by filtering. Eq. (2) shows the implementation method.

(2)

$ h_{W,b}\left(x^{\left(i\right)}\right) $

In Eq. (2), $y\left(n\right)$is the output signal after signal enhancement processing, $x\left(n\right)$is the input signal, and $\mu $ is an enhancement factor with a value close to 1.

After the frame processing, to increase the continuity between the frames, reduce the edge effect, and reduce the leakage of the spectrum, the window processing of the signal was studied. The process is shown in Eq. (3).

(3)

$ s_{w}\left(n\right)=y\left(n\right)\times w\left(n\right)\begin{array}{cc} 0\leq n\leq N_{f}-1\end{array} $

In Eq. (3), $s_{w}\left(n\right)$is the signal after windowing, and $w\left(n\right)$is the window function. This research used the Hamming window as the windowing function, as shown in Eq. (4).

(4)

$ w\left(n\right)=0.54-0.46\cos \left(\frac{2\pi n}{N_{f}-1}\right)\begin{array}{cc} & 0\leq n\leq N_{f}-1\end{array} $

The selection of music features determines the performance of the recognition system to some extent. Good speech features can improve the accuracy and speed of music signal classification. People's perception of music sound quality is mainly measured by pitch, timbre, and rhythm. They can all be abstracted into feature vectors to represent them. Most of the abstract features of timbre are short-term features, while the abstract features of pitch and rhythm are mostly long-term features. In the view of the transform domain, short-term features are divided into time domain (TD) features, frequency domain (FD) features, and cepstrum domain features.

The Frequency Domain Features (FDF) of music is the characteristic parameters obtained by processing the signal in the FD features after applying the Fourier transform to the music signal first. Common FD features include the spectral centroid, spectral energy, spectral bandwidth, spectral sub-band energy, spectral flow, and spectral sub-band flow. The spectral centroid is used to measure the center of the spectrum. The larger the value, the more high-frequency components of the signal there are. It is calculated with Eq. (5).

(5)

$ SC=\frac{{\sum }_{\omega =l}^{h_{0}}\omega \left| F\left(\omega \right)\right| ^{2}}{{\sum }_{\omega =l_{0}}^{h_{0}}\omega \left| F\left(\omega \right)\right| ^{2}} $

In Eq. (5), $F\left(\omega \right)$is the Fourier transform of a frame signal, and $l$ and $h$represent the minimum and maximum values of frequencies in the sub-bands, respectively. The spectrum energy is found with Eq. (6).

(6)

$ SE=\sqrt{\frac{1}{h_{0}-l_{0}}{\sum }_{\omega =l_{0}}^{h_{0}}\left| F\left(\omega \right)\right| ^{2}} $

The spectral bandwidth is weighted by the distance between the spectral energy and the spectral centroid and mainly measures the FDF range of music, as shown in Eq. (7).

(7)

$ SC=\frac{{\sum }_{\omega =l_{0}}^{h_{0}}\left(\omega -SC\right)^{2}\left| F\left(\omega \right)\right| ^{2}}{{\sum }_{\omega =l_{0}}^{h_{0}}\left| F\left(\omega \right)\right| ^{2}} $

The spectrum flow reflects the dynamic characteristics of the spectrum, which is the sum of the squares of the corresponding points in the FD of two adjacent frames, which reflects the sum of the spectrum transformation. The calculation is done with Eq. (8).

(8)

$ SF=\frac{1}{h_{0}-l_{0}}{\sum }_{\omega =l_{0}}^{h_{0}}\left| F\left(\omega +1\right)-F\left(\omega \right)\right| $

The mel-frequency cepstral coefficients (MFCCs) are cepstrum parameters extracted in the mel-scale FD. Eq. (9) shows the relationship of mel frequency and linear frequency

(9)

$f_{mel}=1127\times \ln \left(1+\frac{f}{700}\right)$

In Eq. (9), $f$is the linear frequency. After framing, signal enhancement, and windowing, the mel-frequency coefficient (MPC) of the music signal was extracted. Based on an MFCC, an MPC does not require a discrete cosine transform, and the result is directly output after logarithmic calculation. The extraction of MPC is shown in Fig. 2.

The frequency spectrum was decomposed into several sub-bands by using a mel bandpass filter to obtain the parameter frequency. The natural logarithm of the frequency was then calculated to obtain the MPC parameter. For better unsupervised learning, the restricted Boltzmann machine (RBM) was introduced based on the MPC parameters, and the maximum likelihood function was used to optimize the selected feature parameters. The MPC feature vector is the input layer of RBM. The network parameters are continuously updated after training. The update method is shown in Eq. (10).

(10)

$\begin{align*} \begin{cases} \Delta w_{ij}=\Delta w_{ij}+\left[p\left(h_{i}=1\left| s^{\left(0\right)}\right.\right)x^{\left(0\right)}-p\left(h_{i}=1| s^{\left(k\right)}{x_{j}}^{\left(k\right)}\right)\right]\\ \Delta a_{j}=\Delta a_{j}+\left[{x_{i}}^{\left(0\right)}-{x_{j}}^{\left(k\right)}\right]\\ \Delta b_{j}=\Delta b_{j}+p\left(h_{i}=1\left| s^{\left(0\right)}\right.\right)-p\left(h_{i}=1\left| s^{\left(k\right)}\right.\right) \end{cases} \end{align*}$

In Eq. (10), $\Delta w$is a weight matrix in the visible layer and hidden layer, $\Delta a$and$\Delta b$are the bias vectors of the visible layer and the hidden layer, respectively, and $p\left(h\left| s\right.\right)$ is the corresponding hidden layer probability distribution when the visible unit is a specific training sample $s$.

Emotion is an attribute of music, and emotional characteristics are an important feature in music signals. Vectors related to emotions can be divided into time-domain feature vectors and frequency-domain feature vectors. The TD and FD characteristic vectors include the mean, variance, intensity, maximum, center of gravity, bandwidth, roll, and flow. In order to judge the impact of features on the music emotions, this study adopted the sequential floating forward selection method to obtain the features that affect the emotional model. Due to the subjectivity and the singularity of emotional features, the classification effect is not ideal when used alone in classifiers. For this reason, mixed feature vectors were used in this study. The feature vector of the frame t can be expressed as Eq. (11).

(11)

$\begin{align*} \begin{cases} {X_{t}}^{MPC}=\left[MPC_{t}\left(0\right),MPC_{t}\left(1\right),\ldots ,MPC_{t}\left(M-1\right)\right]\\ {M_{t}}^{MOOD}=\left[TA,TV,TC,TB,TR,TF,FA,FV,FC,FB,FR,FF\right] \end{cases} \end{align*}$

In Eq. (11), $M$is the order of the MPC feature vector.

The emotion feature vector adopts 12 TD and FD features. The two types of characteristics are fused to obtain the final combination of music signal features. The feature fusion process is shown in Fig. 3. Based on these operations, the preprocesses of framing, signal enhancement, and windowing on the input music signal were studied. The feature combination of an MPC feature and emotional feature fusion was extracted, which provided a basis for the classification model construction.

Fig. 1. Diagram of overlapping frame division processing.

Fig. 2. MPC feature vector extraction process.

Fig. 3. MPC features and emotional features are fused into a combined feature flow.

3.2 Music Classification based on DBN-HMM

Music classification plays a very important role in a music signal retrieval. Users who like to listen to piano music are not interested in all styles of music used. In order to carry out music retrieval according to their interests and hobbies, they were classified according to the fusion characteristics of piano music, which is convenient for efficient management and fast retrieval of music. HMM contains two stochastic processes, the observed state and the hidden state, and uses parameters to represent the statistical properties of the stochastic process. HMM has a property of essentially reflecting sound. It is widely used in speech signal processing, which can be expressed as a quintuple in Eq. (12).

(12)

$ \left(\Omega _{x},\Omega _{O},A,B,\pi \right) $

In Eq. (12), $\Omega _{x}$is the state set, $\Omega _{O}$is the observation value set, and $A$is a transition probability matrix indicating the probability of transitioning $B$from the state at the current moment $q_{i}$to the state at the next moment. $q_{j}$is an output probability matrix, indicating $q_{i}$ is the observed value’s probability when the state value is $o_{M}$. $\pi $is the initial state distribution.

Fig. 4. Diagram of DBN network structure.

The model obtains the relevant parameters of the HMM by calculating the probability of the observation sequence and the probability of each state and realizes the classification of the sample data. However, the HMM depends on the state labels of the training data, and there are a large number of raw data with missing labels in actual training, which affects the recognition effect. To avoid this defect, HMM and DBN were combined to classify music signals. Fig. 4 shows the specific process of the DBN.

The network structure of DBN is a highly complex directed acyclic graph composed of RBM and is a hierarchical unsupervised learning model. DBN can effectively utilize data with missing labels, its deep network structure enhances the ability to model signal features, and it can provide more accurate observation probabilities. The joint probability distribution is used to represent the relationship in the visible layer and the hidden layer. Eq. (13) displays the calculation method.

(13)

$ \begin{array}{l} P\left(v,h^{1},h^{2},\ldots h^{l}\right)=P\left(v| h^{1}\right)P\left(h^{1}| h^{2}\right)\ldots P\left(h^{l-2}| h^{l-1}\right)\\ P\left(h^{l-1}| h^{l}\right) \end{array}$

In Eq. (13), $l$is the number of hidden layers of the DBN. RBM can obtain better initial parameter values through layer-by-layer training. The network was further enhanced through traditional learning algorithms. The batch gradient descent method was used for network tuning, and the overall loss function of the sample is shown in Eq. (14).

(14)

J W , b = 1 n ∑ i = 1 n J W , b ; x i , y i + λ 2 ∑ i = 1 m i − 1 ∑ i = 1 s i ∑ j = 1 s i W i j l 2 = 1 n ∑ i = 1 n 1 2 | | h W , b x i − x i | | + λ 2 ∑ i = 1 m i − 1 ∑ i = 1 s i ∑ j = 1 s i W i j l 2

In Eq. (14), ${W_{ij}}^{\left(l\right)}$ is the connection weight coefficient; $l$ is the number of hidden layers; $i$ and $j$ represent the number of nodes in the current and subsequent hidden layers, respectively; $h_{W,b}\left(x^{\left(i\right)}\right)$ is the offset of the node; ${b_{i}}^{l}$ is the result after reconstruction. The partial derivative of the weight coefficient and bias is shown in Eq. (15).

(15)

$\begin{align} \begin{cases} \frac{\partial J\left(W,b\right)}{\partial {W_{ij}}^{\left(l\right)}}=\left[\frac{1}{n}{\sum }_{i=1}^{n}\frac{\partial J\left(W,b;x^{\left(i\right)},y^{\left(i\right)}\right)}{\partial {W_{ij}}^{\left(l\right)}}\right]+\lambda {W_{ij}}^{\left(l\right)}\\ \frac{\partial J\left(W,b\right)}{\partial {b_{i}}^{\left(l\right)}}=\frac{1}{n}{\sum }_{i=1}^{n}\frac{\partial J\left(W,b;x^{\left(i\right)},y^{\left(i\right)}\right)}{\partial {b_{i}}^{\left(l\right)}} \end{cases} \end{align} $

Fig. 5. Basic structure of DBN+HMM model.

The DBN-HMM model was trained and classified by estimating the posterior probability of HMM, and its basic structure is shown in Fig. 5. To improve the loss function, the gradient descent method was used to minimize the reconstruction mean square error. The objective function takes the cross-entropy between the reference state label and the predicted state distribution, as in Eq. (16).

(16)

$ F_{CE}=-{\sum }_{u=1}^{U}{\sum }_{n=1}^{N}\log y\left(s\right) $

In Eq. (16), $s$ is the current state, and $y\left(s\right)$is the predicted state distribution. The output results in the DBN output layer node are the input of HMM, and the posterior probability of the HMM state is calculated using the Softmax regression model. The output state distribution is expressed as Eq. (17).

(17)

y n s = P s | O n = exp a n s ∑ s exp a n s

In Eq. (17), $P\left(s\right)$is the prior probability of the state appearing in training data $s$. The gradient expression between the objective function and the activation probability is shown in Eq. (18).

(18)

$ \frac{\partial F_{CE}}{\partial a_{n}\left(s\right)}=-\frac{\partial \log y_{n}\left(s\right)}{\partial a_{n}\left(s\right)}=y_{n}\left(s\right)-\delta _{s;{s_{n}}} $

In Eq. (18), $F_{CE}$ is the objective function, $a_{n}\left(s\right)$is the activation probability, $\delta _{s;{s_{n}}}$is the Ronecker function, which satisfies Eq. (19).

(19)

$ \delta _{s;{s_{n}}}=\begin{cases} 1\begin{array}{cc} & s=s_{n} \end{array}\\ 0\begin{array}{cc} & s\neq s_{n} \end{array} \end{cases} $

It was found that the recognition effect of the model still needs to be further improved. During the HMM training, the algorithm produces large differences in the randomly selected initial matrix parameters, making the result trapped at a local optimum and affecting the accuracy of model classification and recognition. To optimize the parameters, the global search advantages of a genetic algorithm (GA) were used to optimize the initial parameters of the matrix of the HMM. This was done to deal with the sensitive problem caused by the random selection of the initial parameters of the Baum-Welch training algorithm.

However, the traditional GA easily produces ``super individuals'' in the evolution process, which affect the subsequent evolution. For this reason, the mutation operator was improved, and the chaos operator was used to carry out the mutation operation to improve the quality of the evolution result. Mutation contributes to the generation of new individuals, and the mapping model obtained through the chaotic mapping operator is shown in Eq. (20).

(20)

$ \begin{cases} x_{n+1}=kx_{n}-\begin{array}{cc} g\left(x\right) & 0\leq k\leq 1 \end{array}\\ g\left(x\right)=2\tan \mathrm{h}\left(ax_{n}\right)\exp \left(-3{x_{n}}^{2}\right) \end{cases} $

In Eq. (20), $x$ is the neuron’s internal state, $k$is a damping factor of the neuromembrane, $g\left(x\right)$ is nonlinear self-feedback. Based on Gaussian variation, the function of a Gaussian normal distribution was changed to a chaotic mapping function. The improved variation operator is shown in Eq. (21).

(21)

$ {C_{i}}^{'}=C_{i}+sg\left(x_{i}\right) $

In Eq. (21), $s$is the mutation scale, $g$ is the quoted annealing factor. Based on these operations, the GA was used to perform the parameter initialization in HMM, and then the DBN was used to classify the style of a piano music signal. Fig. 6 displays the classification flow.

Fig. 6. Classification process of piano-music style-classification model based on DBN-GA-HMM.

4. Performance Analysis of Music-style Classification Model based on Improved DBN-HMM

To test the rationality of the selection of music-signal feature vectors, the one-dimensional MPCs of 10 different music signals were compared, as shown in Fig. 7. The MPC curve is not smooth, and there are many mutations, which can fully reflect the music attributes. Moreover, the magnitude and change direction of the MPC features of the 10 types of music signals are completely different. It can be seen that the MPC features can better characterize and distinguish the 10 types.

To further prove the rationality of merging music signal features into combined features in the model, the precision (P), recall (R), and comprehensive evaluation indicators (F_0.5) of a single feature type and a combination of feature types were compared. Table 1 displays the comparison results. The P value obtained by MPC+MOOD combination features was 95.40%, the F_0.5value was 94.45%, and the R value was 92.23%. The P value of MFCC+MOOD was 87.68%, which is 7.72% lower than that of MPC+MOOD. The F_0.5value was 85.35%, which is 9.10% lower than that of MPC+MOOD, and the R value was 82.32%, which is 9.91% lower than that of MPC+MOOD. The precision rate, recall rate, and F_0.5 value obtained by the other three individual features do not exceed 80%. The combination feature type can obtain better classification results in model classification, and the classification effect of the MPC+MOOD feature value is better than that of the MFCC+MOOD combination feature.

To verify the improvement effect of the classification method, three types of models before and after optimization (HMM, DBN+HMM, and DBN+GA+HMM) were iteratively trained, and the training situation is shown in Fig. 8. The average classification accuracy gradually increases with the number of iterations, the loss value of the model gradually decreases. The DBN+GA+HMM model iterates 208 times to reach the target loss value, and the DBN+HMM model iterates 350 times to reach the target value, which is 142 more iterations than DBN+GA+HMM. The HMM model reaches the target value in 451 iterations, which is 443 more iterations than DBN+GA+HMM. During the iteration process, the classification accuracy curve of the DBN+GA+HMM model was always above the curves of the other two models, and the accuracy reached 80% when the iterations reached 40. The improved model had better convergence than the other two models, and the quality of the classification results was higher.

Table 1. Comparison of classification effects between fused features and single features.

Feature vector	Dataset 1			Dataset 2
Feature vector	P (%)	F _0.5(%)	R (%)	P (%)	F _0.5(%)	R (%)
MPC	75.01	71.58	69.32	74.98	71.22	69.02
MOOD	76.48	73.58	71.36	76.36	73.17	71.28
MFCC	75.48	72.39	70.26	75.69	72.45	70.49
MFCC+MOOD	87.47	85.46	82.17	87.88	85.24	82.46
MPC+MOOD	96.47	94.56	92.32	96.78	94.33	92.14

Fig. 7. Comparison of one-dimensional MPCs of 10 different types of music signals.

Fig. 8. Iterative training of model before and after optimization.

To further verify the classification performance of the model, the proposed model (model 1) and four popular classification models were compared: a classification model based on an intelligent swarm optimization neural network (model 2), a classification model based on StarGAN (model 3), a classification model based on AdaBoost and ELM (model 4), and a classification model based on a BPNN (model 5). Eight types of music were classified separately and compared with the actual results, as shown in Fig. 9.

The output result curves of each music genre are different. The change curve of model 1 is basically consistent with the actual result curve, and the fitting degree reached 0.96. The curve change of model 2 was basically consistent with the actual curve, but there were some discrepancies, and the fitting degree was 0.83, which is 0.13 lower than that of model 1. The fitting level of model 3 was 0.79, which is 0.17 lower than model 1. The fitting of model 4 was 0.74, which is 0.2 lower than model 1. The fit of model 5 was 0.70, which is 0.26 lower than model 1. In Fig. 9, model 1 has the highest classification and recognition accuracy.

To prove the stable performance of the model, the scale of sample data was continuously increased during the experiment, and five models were used to classify the data. Changes in classification accuracy of the model is shown in Fig. 10. With the number of samples added, the classification accuracy of each model decreases. The curve change of model 1 was the most stable. When the number of samples was 50, the classification accuracy rate was 98.78%. When the number of samples reached 600, the accuracy rate was 96.89%, which only dropped by 1.89%, the accuracy was 95.43%. When the number of samples increased to 600, the accuracy was 79.32%, a decrease of 16.10%.

The accuracy of model 3 decreased from 91.23% to 76.54%, and the accuracy of model 4 decreased from 96.47% to 71.59%. Model 5’s accuracy dropped from 96.81% to 65.75%. Comprehensive analysis of the content in Fig. 10 shows that model 1 had higher stability. The classification accuracy was less affected by the increase in the number of samples.

PRD was introduced as an evaluation index of the model classification prediction. In the experiment, 5 models were used to classify the music signals of two datasets, and the relevant indicators are shown in Table 2. The average PRD value of model 1 was 2.402, the running time was 2.117 s, and the accuracy was 97.074%. The average PRD value of model 2 was 1.696, which was 0.706 lower than model 1, and the running time was 3.457 s, which was more than model 1 (1.340 s). The accuracy was 90.535%, which was 6.539% lower than model 1.

The average PRD value of model 3 was 1.613, which was 0.789 lower than model 1. The running time was 3.843s, which was 1.725 s longer than model 1. The accuracy was 85.360%, which is 11.714% lower than model 1. The average PRD value of model 4 was 1.407, which was 0.995 lower than model 1, and the running time was 4.030 s, which was 1.913 s longer than model 1. The accuracy was 84.173%, which was 12.901% lower than model 1. The average PRD value of model 5 was 1.383, which was 1.017 lower than model 1, and the running time was 4.331 s, which was 2.214 s longer than model 1. The accuracy was 80.219%, which was 16.855% lower than

Fig. 9. Classification result curve of each model for four music types.

Fig. 10. Change curve of model accuracy with the increase of sample number.

Table 2. Comparison of evaluation index data of classification model.

Project	Dataset 1			Dataset 2
Project	Run time/s	Accuracy (%)	PRD	Run time/s	Accuracy (%)	PRD
Model 1	2.269	97.323	2.378	1.965	96.824	2.426
Model 2	3.468	90.487	1.697	3.445	90.582	1.695
Model 3	3.846	85.484	1.623	3.839	85.236	1.602
Model 4	4.021	84.222	1.401	4.039	84.123	1.412
Model 5	4.365	80.176	1.398	4.298	80.261	1.367

model 1. A comprehensive analysis of the contents of Table 2 shows that model 1 has the shortest running time, the highest PRD value, and the highest classification accuracy, and it can efficiently and accurately classify piano music styles.

5. Conclusion

The style of music represents regional and cultural differences, and this difference is reflected in the difference of musical elements such as rhythm and melody. In order to realize the accurate classification of piano music style, this study used a DBN and HMM. During the HMM training process, the algorithm had a large difference in the randomly selected initial matrix parameters, so that the result was trapped at a local optimum, which affected the accuracy of model classification and recognition. In order to optimize the parameters, a GA was used to optimize the classification model.

The experimental analysis showed that the P value obtained by MPC+MOOD combined features was 95.40%, the F_0.5value was 94.45%, and the R value was 92.23%. The combined feature type can obtain better classification results. The change curve of model 1 basically coincided with the actual result curve, and the fitting degree reached 0.96. The average PRD value of model 1 was 2.402, the running time was 2.117 s, and the accuracy was 97.074%.

In future research, more features that characterize music attributes can be studied. The HPSS algorithm could be used to separate the original music signal spectrum into time feature harmonic components and frequency feature impulse components and combined with the original spectrum as input to the model. The MFCCs of different types of music signals could also be used as feature values to further improve the model performance. Research data can also be selected from a wider range of music genres to expand the application scope of the model.

REFERENCES

Shi N, Wang Y. Symmetry in computer-aided music composition system with social network analysis and artificial neural network methods. Journal of Ambient Intelligence and Humanized Computing, 2020: 1-16.

Cífka O, Şimşekli U, Richard G. Groove2Groove: one-shot music style transfer with supervision from synthetic data. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 2638-2650.

Kempfert K C, Wong S W K. Where does Haydn end and Mozart begin? Composer classification of string quartets. Journal of New Music Research, 2020, 49(5): 457-476.

Ramírez J, Flores M J. Machine learning for music genre: multifaceted review and experimentation with audioset. Journal of Intelligent Information Systems, 2020, 55(3): 469-499.

Scott C D. Policing Black sound: performing UK Grime and Rap music under routinised surveillance. Soundings, 2020, 75(75): 55-65.

Zhang K. Music style classification algorithm based on music feature extraction and deep neural network. Wireless Communications and Mobile Computing, 2021, 2021: 1-7.

Li T. Visual classification of music style transfer based on PSO-BP rating prediction model. Complexity, 2021, 2021: 1-9.

Solanki A, Pandey S. Music instrument recognition using deep convolutional neural networks. International Journal of Information Technology, 2022, 14(3): 1659-1668.

Ghatas Y, Fayek M, Hadhoud M. A hybrid deep learning approach for musical difficulty estimation of piano symbolic music. Alexandria Engineering Journal, 2022, 61(12): 10183-10196.

Ziemer T, Kiattipadungkul P, Karuchit T. Music recommendation based on acoustic features from the recording studio. The Journal of the Acoustical Society of America, 2020, 148(4): 2701-2701.

Chamikara M A P, Bertok P, Liu D, Camtepe S, Khalii I. Efficient privacy preservation of big data for accurate data mining. Information Sciences, 2020, 527: 420-443.

Jayasri N P, Aruna R. Big data analytics in health care by data mining and classification techniques. ICT Express, 2022, 8(2): 250-257.

Liu J, Yang L, Zhou H, Wang S. Impact of climate change on hiking: quantitative evidence through big data mining. Current issues in tourism, 2021, 24(21): 3040-3056.

Ageed Z S, Zeebaree S R M, Sadeeq M M. A survey of data mining implementation in smart city applications. Qubahan Academic Journal, 2021, 1(2): 91-99.

Ji K. Malicious Intrusion Data Mining Algorithm of Wireless Personal Communication Network Supported by Legal Big Data. Wireless Communications and Mobile Computing, 2021, 2021: 1-7.

Keith J A, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller K R, Tkatchenko A. Combining machine learning and computational chemistry for predictive insights into chemical systems. Chemical reviews, 2021, 121(16): 9816-9872.

Nabipour M, Nayyeri P, Jabani H, Shahab S, Mosavi A. Predicting stock market trends using machine learning and deep learning algorithms via continuous and binary data; a comparative analysis. IEEE Access, 2020, 8: 150199-150212.

Yu W, Zhu C, Tsunooka Y, Huang W, Dang Y, Kutsukake K, Harada S, Tagawa M, Ujihara T. Geometrical design of a crystal growth system guided by a machine learning algorithm. CrystEngComm, 2021, 23(14): 2695-2702.

Tchito Tchapga C, Mih T A, Tchagna Kouanou A, Fonzin T, Fogang P K, Mezatio B, Tchiotsop D. Biomedical image classification in a big data architecture using machine learning algorithms. Journal of Healthcare Engineering, 2021, 2021: 1-11.

González S, García S, Del Ser J,. A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Information Fusion, 2020, 64: 205-237.

Zhu N, Zhu C, Emrouznejad A. A combined machine learning algorithms and DEA method for measuring and predicting the efficiency of Chinese manufacturing listed companies. Journal of Management Science and Engineering, 2021, 6(4): 435-448.

Author

Wei You

Wei You, graduated from Jiangxi University of Science and Technology with a master’s degree in Music Education. Holds the position of Lecturer. Throughout the teaching career, has been responsible for courses such as “Piano” and “Orff Music Teaching Method”. Main research areas include Piano and Orff. Published more than 10 research papers and has taken part in 8 research projects (such as “Research on the Inheritance of Traditional Culture in Preschool Education—Taking the Introduction of Local Instruments into Orff Curriculum Teaching as an Example” and “Music Communication of Haydn’s Piano Sonatas in Jiangxi Vocational Colleges”). Also, has authored or contributed to the editing of 7 textbooks (including “Piano” and “Ding Dang Small Glockenspiel Innovative Music Enlightenment Series Textbooks”).

IEIE SPC IEIE Transactions on Smart Processing & Computing

Journal Search

Journal XML

Journal Information

Modeling Method for Classification of Piano Music Style based on Big Data Mining and Machine Learning

Abstract

Keywords

1. Introduction

2. Related Works

3. Construction of Piano Music Style Classification Model based on Improved DBN-HMM

3.1 Music Signal Preprocessing and Feature Parameter Extraction

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

Fig. 1. Diagram of overlapping frame division processing.

Fig. 2. MPC feature vector extraction process.

Fig. 3. MPC features and emotional features are fused into a combined feature flow.

3.2 Music Classification based on DBN-HMM

(12)

Fig. 4. Diagram of DBN network structure.

(13)

(14)

(15)

Fig. 5. Basic structure of DBN+HMM model.

(16)

(17)

(18)

(19)

(20)

(21)

Fig. 6. Classification process of piano-music style-classification model based on DBN-GA-HMM.

4. Performance Analysis of Music-style Classification Model based on Improved DBN-HMM

Table 1. Comparison of classification effects between fused features and single features.

Fig. 7. Comparison of one-dimensional MPCs of 10 different types of music signals.

Fig. 8. Iterative training of model before and after optimization.

Fig. 9. Classification result curve of each model for four music types.

Fig. 10. Change curve of model accuracy with the increase of sample number.

Table 2. Comparison of evaluation index data of classification model.

5. Conclusion

REFERENCES

Author

Wei You

Article Information (continued)

Keywords

IEIE SPC

IEIE Transactions on Smart Processing & Computing