Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 14, No. 02, p.178-190

ISSN (online) :

2287-5255

Received : 8 January 2024Revised : 7 April 2024Accepted : 30 April 2024

DOI :

https://doi.org/10.5573/IEIESPC.2025.14.2.178

Regular Paper

Design and Implementation of a Multi-factor Intelligent Mining System for Stocks Based on GA-TGCN

ZhangBo¹ TaoMeichen¹

(Taizhou Vocational College of Science & Technology, Taizhou 318020, China)

^* Corresponding Author: Bo Zhang, Bo_Zhang23@163.com

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

The multi factor mining system for stocks can provide favorable support for the analysis of financial markets. Due to the large number of influencing factors, the design of such systems generally faces significant deviation in prediction results. The continuous advancement of intelligent algorithm technology provides more diversified support methods for stock multi factor combination analysis, and also provides more technical support for risk assessment and problem decision-making in financial markets. Therefore, in order to achieve a more efficient stock multi factor analysis model, this paper constructs a stock multi factor intelligent mining system for financial markets based on the GA-TGCN intelligent algorithm. The GA-TGCN stock multi factor intelligent mining system based on TGCN technology and integrating GA algorithm provides more support for the prediction of multi factor combination analysis. By predicting and analyzing factors and strategy returns, excellent multi factor strategies can be obtained by combining simulated trading. Combining the application of different models in the process of multi factor mining combinations, it was found that the GA-TGCN system can achieve higher accuracy and lower loss values while achieving smaller prediction errors, laying the foundation for improving the efficiency of stock multi factor combination analysis.

Keywords

Temporal graph convolutional network, Stock multi-factor, Genetic algorithm, Intelligent mining

1. Introduction

With the development of China's financial market, more and more people are entering the field of financial investment. Financial time series analysis plays an important role in hedging market risks and optimizing investment decisions. In this context, the rapid development of machine learning technology has provided new ways to predict financial data, and the accuracy and practicality of predictions have been rapidly improved ^[1]. Researchers used common machine learning methods to predict the rise and fall of closing prices, including LR algorithm, KNN algorithm, RF algorithm, and RBM algorithm. However, the price changes of financial assets are influenced by various factors and reflect complex market conditions at different diffusion rates. The performance of traditional machine learning methods is not sufficient to meet people's needs, so deep learning models have begun to receive more attention ^[2,^3]. The application of deep learning technology has achieved great success, demonstrating better performance than traditional linear models and machine learning models in tasks including stock market prediction.

In order to avoid redundant operations in modeling multiple stocks separately, many financial institutions or stock exchanges have compiled stock indices that are aggregated from multiple representative stocks. Modeling a stock index not only allows for simultaneous analysis of multiple stocks, but also allows for observation of the overall situation of a certain stock market ^[4,^5]. However, the original stock index data often consists of a single time series containing only historical information, which limits the information available for modeling. At the same time, the original stock index sequence has the characteristics of nonlinearity and non-stationary, and often contains a large amount of noise and useless information. This makes traditional time series prediction methods unable to achieve high prediction accuracy ^[6].

More and more scholars are using deep neural networks for analyzing time series data such as stocks; However, traditional neural network layers have no connections between nodes, and samples at each time are independent of each other, resulting in insufficient temporal relationship representation ability. RNN algorithm can alleviate the above problems to some extent. RNN inputs input features and hidden state features into the same structure at each time step, thereby capturing the long-term dependence of input features ^[7,^8]. The most famous RNN variants are LSTM networks and GRU algorithm networks. Although the above models can be modeled from the perspective of temporal correlation, they typically view each stock as independent of each other and overlook many interpretable factors in the financial market. With the development of the financial market, there are extensive connections between enterprises, which may affect the stock price changes of the target company by related companies ^[9,^10].

Current quantitative platforms' simulation backtesting trading functions typically involve constructing a prediction model with mechanical parameter adjustments. However, users' stock selection strategies often necessitate manual factor combinations, lacking innovation and sustainability. Adaptation of stock selection plans amid numerous market factors relies heavily on personal expertise, leading to varied accuracy levels and insufficient prediction precision in financial market analysis. To address these challenges, this article proposes integrating the stochastic optimization features of genetic algorithms (GA) to delve into factor combination problems. Combining the graph convolutional neural betwork (GCN) model, the study aims to predict and optimize randomly combined multi-factor strategies swiftly. By analyzing strategy stock selection from various angles, it paves the way for mining and forecasting multi-factor stocks. The research establishes a GA-TGCN model based on deep learning principles. It encodes factors using GA, constructs multi-factor strategies randomly, and validates them using quantitative backtesting models. The approach integrates mathematical integration into the model to consider factor complexities and global feature relationships accurately. Additionally, factor screening models predict optimal values within strategies for backtesting and verification. Comparative experiments demonstrate the model's superiority in enhancing prediction accuracy and the quality of multi-factor strategies within the same operational framework.

2. Theoretical Analysis and Current Status of Intelligent Mining Algorithms for Stocks

2.1 Quantitative Trading and Multi-factor Stock Selection Theory

Quantitative trading theory is an optimal investment decision-making technique based on mathematical analysis and computer technology. It replaces subjective human judgments and uses mathematical knowledge and computer technology to learn optimal investment strategies. Investors seeking absolute advantages in returns have attempted to adopt quantitative trading methods ^[13,^14,^15]. The quantitative trading method covers the characteristics of financial markets, which can be mainly divided into three types based on the effectiveness of financial information.

Most financial markets have gradually developed from ineffective markets to efficient markets, and quantitative trading has also been further developed accordingly. Most economies have now reached a semi strong efficient market state, and the application of quantitative analysis technology is effective at this time. The field of quantitative trading contains a lot of technical content, which can usually be classified from three perspectives: data models, machine learning, and strategic decision-making ^[16].

In addition, the model of multi factor stock selection theory is actually evolved from the arbitrage pricing model ^[17]. Scholars have observed the relationship between expected asset returns and systemic risk in the market and found a linear relationship between the two, which is referred to as the Capital Asset Pricing Model (CAPM). Its expression is shown in formula (1):

(1)

$ E(r_i)=r_f+\beta_i [E(r_m)-r_f ]. $

Researchers continuously combine current proven investment theories and optimize and develop them based on them, effectively studying the linear relationship between stock investment returns and various factors. The arbitrage model is analyzed through statistics and finance, and a multi-factor model is established to filter stock data. These factors are based on daily K-line data of stocks, such as closing price, highest price, lowest price, etc., and are processed and combined with other types of factors. Multiple screening methods are used to select suitable strategies ^[18,^19]. The multi-factor model considers a wide range, depth, and variety of investment factors, which has a higher fitting degree to market patterns and is suitable for the current era of massive data. The multi-factor model is shown in formula (2):

(2)

$ G=a\times h+b\times i+c\times j+...+k. $

Among them, $G$ represents the rate of return; $h$, $i$, $j$ are different stock factors; $a$, $b$, $c$ are the correlation coefficients corresponding to linear formulas for each factor; $k$ represents the deviation value.

After market research and relevant analysis, referring to the summary of multiple securities firms' research reports, the commonly used factors in the market mainly include the following types, as shown in Table 1.

Table 1. Classification of common factor indicators.

Indicator type	Indicator Name
profit factor	Earnings per share, return on equity, fluctuations, etc
Valuation factor	P/E ratio, P/S ratio, P/B ratio, PEG, etc
Debt repayment factor	Asset liability ratio, current ratio, equity ratio, etc
Growth factors	Net profit year-on-year growth rate, revenue year-on-year growth rate, etc

Through the continuous development of research theory and technology, researchers believe that there is a certain linear relationship between the number of factors included in multi factor models and prediction accuracy. However, with the continuous development of the trading market and society, the efficiency of factor construction is also decreasing, so there is also a risk of failure in factor models ^[20].

2.2 Genetic Algorithm and Temporal Graph Convolutional Neural Network

2.2.1 Genetic Algorithm

Genetic algorithms generally start from initializing the population. After initialization, the obtained initial population will enter the entire evolutionary process. The fitness evaluation method is used to evaluate the survival adaptability of individuals in the current environment, representing the advantages and disadvantages of the solution ^[21,^22,^23]. In the process of genetic selection, when natural selection is performed in a population, the probability of maternal selection increases with the increase of fitness. The sampling probability of individual Ii is shown in formula (3):

(3)

$ P(I_i)=\frac{f(I_i)}{\sum_(i=1)^M I_i}. $

Among them, $f(I_i)$ is the fitness of individual $I_i$, and $M$ is the number of candidate individuals.

In addition, in typical genetic algorithms, formula (4) holds:

(4)

$ \begin{align} P(H)^{t+1} \ge P(H)^t \frac{\bar{f}(H)}{\bar{f}}[1-P_m O(H)] \times \left[1-P_m \frac{L(H)}{L-1}\right]. \end{align} $

Among them, $H$ represents the pattern, referring to similar modules in the encoding space, and $L(H)$ represents the length of the pattern. $O(H)$ refers to the order of the pattern, which is the number of fixed gene loci. $f(H)$ is the average fitness of the pattern. $P(H)^{t+1}$ represents the probability of pattern occurrence in the $t+1$ generation population ^[24,^25]. The main function of selection operators is to retain individuals with high fitness in the population, eliminate individuals who cannot adapt to the environment, and improve the overall adaptability of the population to the environment ^[26].

2.2.2 Temporal Graph Convolutional Network

As an important branch of GNN, GCN is a novel node embedding and label prediction algorithm. GCN can handle irregular topology data, take graph structure and node features as inputs, and aggregate and transform neighbor information of each node to create a new representation for that node. This representation has great reference value for predicting target variables such as node labels or categories. Graph convolution is similar to traditional convolution, so GCN can also perform multi-layer processing. The input of each layer is still the node feature. By using the current feature and network structure diagram as the input of the next layer, continuous operations can be achieved, as shown in Fig. 1, which shows the processing flow and graph structure data of the GCN model ^[27].

Fig. 1. Processing flow and graph structure data of GCN model.

In addition to the adjacency matrix and degree matrix, the purpose of graph convolution is to obtain a Laplacian matrix. The specific operation step is to process the adjacency matrix and degree matrix according to formula (5), and the final matrix represents the degree of change in the graph data.

(5)

$ \left\{ \begin{aligned} & L=D-A,\\ & L_{sym} = \tilde{D}^{-\frac{1}{2}} \tilde{L} \tilde{D}&{-\frac{1}{2}}. \end{aligned} \right. $

Among them, $L$ represents the Laplace matrix, $A$ represents the adjacency matrix, and $D$ represents the degree matrix. The Laplace matrix obtained through the operation of the adjacency matrix and the degree matrix is also known as the admittance matrix. $L_{sym}$ is a regularized representation of $L$.

At the same time, this method can also ensure that the obtained result is a stable feature matrix containing feature information between nodes. The transfer method between layers in the model is shown in formula (6):

(6)

$ H^{l+1}=\sigma \left(L_{sym} H^l W\right) . $

$L_{sym}$ in the formula represents the regularized Laplacian matrix, and $H$ represents the feature matrix at each level, $\sigma$ Represents a nonlinear activation function.

Temporal graph convolutional network (T-GCN) is a spatiotemporal data modeling method based on GCN and GRU, used to process graph structured data on time series. The core idea of T-GCN is to treat time series data as graph structured data on time steps and introduce time dimensions into the graph. T-GCN introduces a temporal graph convolution algorithm to aggregate and transform neighboring nodes while preserving node feature information, resulting in a more comprehensive and accurate node representation ^[28]. At the same time, T-GCN also utilizes the GRU module to capture temporal relationships in time series data, thereby further improving the prediction accuracy of the model. The model structure is shown in Fig. 2.

Graph convolution is similar to traditional convolution, so GCN can also perform multi-layer processing. The input of each layer is still node features, and continuous operations can be achieved by using the current feature and network structure diagram as the input of the next layer. Unlike traditional GNN, GCN requires the input of a degree matrix in the model, which describes the association information between each node and all other nodes ^[27]. As shown in Fig. 3, a schematic diagram of the stock multi factor recognition network framework based on TGCN transformation is provided.

Fig. 2. Schematic diagram of temporal graph convolutional network structure.

Fig. 3. Schematic diagram of stock multi-factor recognition network framework based on T-GCN transformation.

3. A Multi Factor Intelligent Mining System for Stocks Based on GA-TGCN Algorithm

3.1 Construction of Multifactor Intelligent Mining System for Stocks

The focus of this model is on how to effectively graph the complex relationship between multiple factors within a strategy and its return over time, and integrate it into the factor screening model ^[29]. Due to the fact that the two exactly meet the input conditions to analyze the relationship between predictive factors and strategy returns, this paper constructs an adjacency matrix using the values of the factors themselves and their representative strategy returns, and inputs them into the model prediction to predict the performance of each factor in the next time period of the strategy. For a certain factor F, the characteristics of its past T time periods can be described using formula (7).

(7)

$ X_{F,t}=(X_{F,t-T},...,X_{F,t-1}). $

The factor screening model includes two types of input information, including the relationship diagram A of factors and the historical feature sequence of factor strategy returns. Therefore, the problem is transformed into predicting $Y_{F,t}$, as expressed in formula (8).

(8)

$ Y_{F,t}=G[f(A,X_{F,t})], $

where $G$ function represents the GRU model, the $f$ function represents the GCN model, $X$ represents the factor strategy return feature matrix $X$ input into the GCN model, and $A$ represents the adjacency matrix of factor relationships.

The original data of the model includes the factors within the multi-factor strategy and the annualized compound return of their corresponding combination strategies. It is first passed into the module input layer, combined with the similarity matrix constructed by the model, and then subjected to feature decomposition ^[30]. Subsequently, the graph convolution layer is used to perform the graph convolution operation, aggregating features with high similarity, and ultimately generating a matrix of the corresponding fundamental factors within the strategy corresponding to the policy benefits. Building this screening model can not only consider the relationship between different factors, but also take into account the changes in factor values and even strategy returns over time. Ultimately, it can predict the appropriate value of each factor within a multi-factor strategy.

The GCN module is used to capture the spatial relationship characteristics between the returns of various factor strategies. The GCN module uses Chebyshev approximation formula to obtain the correlation feature vectors of different dimensions of factors, where $G$ represents the output of GCN, $\sigma$ represents the weight of model learning, $A'$ obtained from the adjacency matrix $A$. In addition, the internal structure of the graph convolution factor screening model constructed by the system is shown in Fig. 4.

Fig. 4. Internal structure of graph convolution factor screening model.

For the internal structure of the graph convolution factor screening model, $GC$ represents the graph convolution process, $h_{t-1}$ represents the state at $t-1$ time, and its calculation process is shown in formula (9):

(9)

$ f(F,A)=\sigma\left[W\times \tanh \times \left(I+D^{-\frac{1}{2}} LD^{-\frac{1}{2}} \right)\right]. $

Among them, $\tanh$ is the activation function, $I$ is the feature matrix, $I+D^{-\frac{1}{2}} LD^{-\frac{1}{2}}$ is the normalized adjacency matrix, $W$ is the weight matrix, and $D$ is the degree matrix of $L$.

By integrating the integration of feature $F$ into the graph convolution process, the numerical integration of feature $F$ can be obtained. The relevant calculation process is shown in formula (10):

(10)

$ \int_{t_{n-1}}^{t_n} F(t)dt\approx(t_n-t_{n-1}) [F(t_{n-1})-F(t_n)]. $

Among them, $\int_{t_{n-1}}^{t_n} F(t) dt$ represents the numerical integration of feature $F$, $t_n$ and $t_{n-1}$ are two adjacent time points, and the right side represents the approximate integration value of the trapezoidal integration rule. It can be understood that the value of function $F(t)$ within the interval $[t_n$, $t_{n-1}]$ at both ends is taken as the upper and lower integration limit, and the trapezoidal area is taken as the approximate integration value, and replaced by the characteristic value of the current time slice.

Based on the above algorithms, the output vector $G$ of GCN in $G=\sigma(A^{'} X \Delta)$ is used as the input part of the GRU module to further capture and organize the spatiotemporal characteristics of the stock multi-factor strategy. The specific calculation steps for the combination of GCN and GRU are shown in formula (11):

(11)

$ Z_t=\sigma\{W_u [f(A,X_t),~h_{t-1}]+b_u \}. $

In the equation, $f(A,~X_t)$ represents the graph convolution process, and $h_{t-1}$ represents the state at time $t-1$.

$R_t$ represents the reset gate, responsible for combining the state at $h_{t-1}$ time with the graph convolution output at $t$ time. The specific algorithm process is shown in formula (12):

(12)

$ R_t=\sigma\{W_r [f(A,X_t),~h_{t-1} ]+b_r \}. $

Based on formulas (11) and (12), the hidden state ct can be obtained, and then the update gate $Z_t$ can be continued backwards to determine how many states from the previous $h_{t-1}$ time are removed, while also determining the amount of information after the update. Finally, all output $h_t$ is concatenated to obtain feature vectors to express the relationship between factors and strategy returns. The specific process is shown in formulas (13) and (14):

(13)

$ h_t=Z_t\times h_{t-1}+(1-u_t)\times c_t,\\ $

(14)

$ c_t=\tanh\{W_c [f(A,X_t),~(r_t\times h_{t-1})]+b_c \}. $

3.2 Training of A Stock Multi Factor Intelligent Mining System

3.2.1 Architecture of Multi Factor Intelligent Mining System for Stocks Based on GA-TGCN

In order to obtain as much experimental data as possible and ensure that the timeline setting is relatively reasonable, this article determines the optimal parameter as sliding window size 15 and step size 1 through experiments with different sliding window sizes and asynchronous lengths, for feature value extraction and prediction ^[31]. Then, based on the output value matrix, the new multi-factor strategy is constructed and backtested, and the return indicators and risk indicators obtained after backtesting are compared with the original strategy to verify whether the model can achieve the expected goals of optimizing the factor strategy. As shown in Fig. 5, the architecture of the GA-TGCN based stock multi-factor intelligent mining system is presented.

Fig. 5. Architecture of GA-TGCN based stock multi-factor intelligent mining system.

3.2.2 Combination Strategy of Multi Factor Intelligent Mining System for Stocks Based on GA-TGCN

Taking the strategy P constructed by the GA random stock selection algorithm in this system as an example, it includes four factors: A, B, C, and D. Based on the factor classification relationship and the model construction method described above, the four factors ABCD can be constructed into adjacency matrices with weight values of 0 and 1. Subsequently, the weekly strategy yield between factor strategy combinations AB, AC, BC... is constructed as a feature matrix, where the values on the diagonal are the weekly strategy yield corresponding to a single factor. According to the above method, continuously moving the timeline back can ultimately obtain the total number of samples minus the number of window periods.

In order to narrow the gap between the predicted value and the true value, this article uses the loss function Loss shown in formula (15) for training. Among them, yi represents the value of the sample itself, yj represents the predicted value, and N represents the total number of samples. This loss function can effectively reduce the error between the two and avoid overfitting problems.

(15)

$ \text{Loss}=\frac{\sum_{i=1}^N(y_i-y_j )^2}{N}. $

4. Experimental Results and Analysis

4.1 Model Evaluation Indicators

In order to more clearly analyze and study the performance of the GA-TGCN based multi-factor intelligent mining system for stocks, it is necessary to determine relevant evaluation indicators as reference standards. Introduce three commonly used evaluation indicators: root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Assuming n is the length of the test set, the basic calculation process of RMSE is given by formula (16):

(16)

$ \text{RMSE}=\sqrt{\frac{1}{n} \sum_{i=1}^n [FT^{'} (i)-FT(i)]^2}. $

The calculation process of mean absolute error (MAE) is given by formula (17):

(17)

$ \text{MAE}=\frac{1}{n}\sum_{i=1}^n |FT^{'} (i)-FT(i)|. $

In addition, as shown in formula (18), the calculation process of the average absolute percentage error (MAPE) is given:

(18)

$ \text{MAPE}=\frac{1}{n} \sum_{i=1}^n \left|\frac{FT^{'} (i)-FT(i)}{FT(i)}\right| . $

4.2 Basic Data Acquisition and Analysis

In the process of obtaining and analyzing basic data, the main strategy is to refer to some financial engineering factor reports and select 32 factors that have been processed for optimization analysis of combination strategies. The basic data used is API structures provided by relevant financial websites such as JK, DFCF, and SQT. Relevant transaction data is obtained to generate new factor indicator data, which provides support for the construction of the GA-TCGN multi factor mining system in the manuscript. In order to further clarify the process of obtaining stock and multi factor data, as shown in Fig. 6, the basic flowchart of obtaining stock and multi factor data is provided.

Fig. 6. Basic flowchart for obtaining stock and multi factor data.

4.3 Model Test Results

In order to verify the predictive ability of the GA-TGCN stock multi-factor intelligent mining system based on this article, comparative experiments were conducted with different models. To ensure that the comparison of different experimental models is more practical, each model is tested under its own optimal parameters.

In order to further analyze the performance comparison of the models, as shown in Fig. 7, the performance comparison of different indicators between GA-TGCN and GCN, Chebnet, GAT, and STSGCN models is presented. According to the comparative experimental results, the three evaluation indicators of the GA-TGCN model are RMSE of 11.4%, MAPE of 9.0%, and MAE of 9.2%, all of which are significantly better than other models. Except for GA-TGCN, among the models mentioned in the table, the MAE and RMSE values of the GAT model are the highest, with values of 0.258 and 0.302, respectively; The maximum MAPE of the Chebnet model is 0.323. The three indicators of the GA-TGCN model are relatively small, with the RMSE indicator decreasing by 18.8% compared to the GAT model; The MAPE index decreased by 23.3% compared to the Chebnet model; The MAE index decreased by 16.6% compared to the GAT model. Therefore, it can be seen that all three indicators above have shown significant improvement effects. In summary, the GA-TGCN stock multi factor intelligent mining system performs better in predicting and analyzing the relationship between factors and returns. The use of the GA-TGCN model can more accurately analyze data, capture and mine relatively valuable factors, and thus make it easier to adjust strategies in a timely manner and improve the final return of strategies.

Table 2. Performance comparison of different models/algorithms.

Algorithm type	MAE	MAPE	RMSE
GCN	0.153	0.112	0.172
Chebnet	0.132	0.323	0.186
GAT	0.258	0.192	0.302
STSGCN	0.129	0.157	0.154
GA-TGCN	0.092	0.09	0.114

In the GA-TGCN based stock multi factor intelligent mining system model, for the public opinion factors obtained through genetic information planning, the mutual information between the backtested public opinion factors and stock price data is used as an adaptation method. By using mutual information to find the nonlinear relationship between profit intervals and factors, many mutual information factors can be continuously obtained in genetic information programming. Mutual information is mainly applied to the relationships between different variables. It can not only find linear relationships of random variables, but also nonlinear correlations. Many nonlinear factors can also be used in traditional factor stock selection models. Therefore, an adaptation approach is needed, which is to find the nonlinear relationship between the return interval and factors. Mutual information shows the similarity between the product of the joint probability distribution function $p (x, y)$ and the edge probability distribution function. The effect diagram of mutual information as an adaptation indicator shown in Fig. 8, where the values of the horizontal and vertical coordinates in the left figure are linear, with the maximum mutual information; The middle graph shows a V-shaped trend, followed by the value of mutual information; The values of the right horizontal and vertical coordinates are chaotic, and the mutual information value is 0.

Fig. 7. Performance comparison of different models.

Fig. 8. Average accuracy and recall of valuation for different system models.

Fig. 9. Accuracy and loss value variation chart.

For the constructed GA-TGCN model, an optimizer is added to optimize the backpropagation algorithm, thereby improving model performance. Use accuracy to evaluate model performance and perform corresponding analysis on the model. At the same time, in terms of the basic parameters of the model, for the default values that need to be determined for the batch, existing parameters can be updated using minimizing structural risk for adaptive learning. Therefore, the batch needs to be large, and the overall data can better describe the characteristics of the entire stock market, ensuring the stability of the data. However, at the same time, if the batch is too large, it may not be possible to find the overall optimal solution. Therefore, considering the existing research settings comprehensively. In addition, the training batch also needs to be debugged to prevent overfitting, but at the same time, the model generalization ability needs to be strong enough.

Fig. 10. Histogram of stock price change rate under different time windows in bank training set.

By observing the accuracy changes of the training data, it can be seen from Fig. 4.3 that when the batch is between 0 and 6, the validation accuracy increases from 0.4 to 0.48; Between 6 and 50, the validation accuracy remains basically unchanged, stable at around 0.48. In addition, for training accuracy, within the range of 0-50, the value is around 0.55 and remains relatively stable. Based on the above analysis, it can be concluded that setting the batch size to 6 is more reasonable, but the performance improvement of the model is limited when the batch size is too large. When the batch is less than 6, the accuracy increases, and when it is greater than 6, the overall curve tends to flatten.

In addition, in order to further verify the performance of the GA-TGCN based stock multi factor intelligent mining system, this article chooses to use daily data of a certain bank's stock as the experimental object. The stock has been listed on the A-share market for 17 years, so its trend is relatively stable and representative. The data from January 1, 2010 to December 31, 2018 will be used as the training set, and the data from January 1, 2019 to June 10, 2021 and a half will be used as the test set. After data organization, a total of 3265 training data and 427 test data were obtained. In order to study the benefits brought by different time windows, 20 candidate response dependent variables were formed using the fluctuation amplitude of the closing price after the next 1 to 20 trading days. As shown in Fig. 10, the histograms of sample price changes in the training dataset are shown when using four time windows: 1 day, 5 days, 10 days, and 15 days as dependent variables. The horizontal axis in the figure shows the rate of change in prices, while the vertical axis shows the number of samples in the dataset with this rate of change.

Based on the above analysis, it can be seen that the GA-TGCN multi factor mining and backtesting validation model constructed in this paper is applied to the quantization system designed in this paper. This system utilizes computers to quickly combine and optimize the analysis of multiple factors, replacing traditional complex manual selection methods. It can also predict, optimize, and retest the selected combination of multi factor strategies. At the same time, the system can study historical data through large-scale timeline backtesting, greatly reducing time costs and improving the work efficiency of personnel in the fintech industry. In addition, this system has the characteristics of automation and efficiency, providing the possibility for investors to develop more accurate and feasible investment strategies.

5. Conclusion

The rapid development of the economy has led to ever-changing stock market conditions. Therefore, in order to enable investors to better respond to the current market economy and make effective decisions, it is necessary to continuously explore and optimize multi-factor combinations and ensure their effectiveness in real trading. The GA-TGCN multi-factor mining system constructed in this article can effectively achieve rapid combination and optimization analysis of multiple factors, and can predict, optimize, and retest selected multi-factor strategy combinations, improving the work efficiency of personnel in the financial technology industry. Based on the automation and efficiency of the GA-TGCN stock multi-factor intelligent mining system, investors can develop more accurate and reliable investment strategies. Through comparative analysis of different models, the main conclusions obtained are as follows:

Combined with the TGCN algorithm, the GA-LSTM stock multi-factor intelligent mining system constructed can achieve automation and efficiency of the system, effectively improving the decision-making effectiveness of multi-factor combinations in the current investment market, and further improving the efficiency of investment strategy formulation. By improving the GA-TGCN model, the deviation between predicted values and true values can be effectively reduced, with RMSE of 11.4%, MAPE of 9.0%, and MAE of 9.2%, all of which are significantly better than other models.

In the GA-LSTM system, the mutual information between the tested public opinion factors and stock price data is used as an adaptation method, and the nonlinear relationship between the return interval and factors is analyzed through mutual information. Many nonlinear factors can also be used in traditional factor stock selection models; Based on the stock data of a certain bank, histograms of sample price changes in the training dataset were displayed using four time windows as the dependent variable, which can provide more risk prediction basis and provide more support for risk prevention strategy formulation and improving returns.

REFERENCES

S. Verma, P. S. Sahu, and P. T. Sahu, ``Stock market forecasting with different input indicators using machine learning and deep learning techniques: A review,'' Engineering Letters, vol. 31, no. 1, pp. 126-138, 2023.

Z. Tang, Y. Cheng, and Z. Wang, ``Quantified investment strategies and excess returns: Stock price forecasting based on machine learning,''Academic Journal of Computing & Information Science, vol. 4, no. 6, 2021.

B. Malti, G. Apoorva, and C. Apoorva, ``Stock market prediction with high accuracy using machine learning techniques,'' Procedia Computer Science, vol. 215, pp. 247-265, 2022.

M. Junaid, A. Preeti, K. Ravreet, A. Mittai, and I. A. Ganaie, ``Stock prediction by integrating sentiment scores of financial news and MLP-regressor: A machine learning approach,'' Procedia Computer Science, vol. 21, no. 8, pp. 1067-1078, 2023.

Y. He, X. Zeng, H. Li, and W. Wei, ``Application of LSTM model optimized by individual-ordering-based adaptive genetic algorithm in stock forecasting,'' International Journal of Intelligent Computing and Cybernetics, vol. 16, no. 2, pp. 277-294, 2023.

A. Sepehr, G. Rouzbeh, and M. Saeedeh, ``Generative adversarial network for sentiment‐based stock prediction,'' Concurrency and Computation: Practice and Experience, vol. 35, no. 2, 2022.

P. Dey, E. Hossain, I. Hossain, M. A. Chowdhury, S. Alam, M. S. Hossain, and K. Andersson, ``Comparative analysis of recurrent neural networks in stock price prediction for different frequency domains,'' Algorithms, vol. 14, no. 8, pp. 32-49, 2021.

L. Lei, ``Stock prediction and analysis based on RNN neural network,'' Proc. of SHS Web of Conferences, vol. 15, no. 1, pp. 89-103, 2022.

G. P. Ranjan and M. M. Narayan, ``An LSTM-GRU based hybrid framework for secured stock price prediction,'' Journal of Statistics and Management Systems, vol. 25, no. 6, pp. 1491-1499, 2022.

J. F. Banu, S. B. Rajeshwari, J. S. Kallimani, SS. Vasanthi, A. M. Buttar, M. Sangeetha, and S. Bhargava, ``Modeling of hyperparameter tuned hybrid CNN and LSTM for prediction model,'' Intelligent Automation & Soft Computing, vol. 33, no. 3, pp. 1393-1405, 2022.

C. Kinjal and T. Ankit, ``Data fusion with factored quantization for stock trend prediction using neural networks,'' Information Processing and Management, vol. 60, no, 3, 2023.

S. Sharma, V. Elvira, E. Chouzenoux, And A. Majumdar, ``Recurrent dictionary learning for state-space models with an application in stock forecasting,'' Neurocomputing, vol. 45, no. 5, pp. 1-13, 2021.

W. Jujie, Z. Zhenzhen, and F. Liu, ``Intelligent optimization based multi-factor deep learning stock selection model and quantitative trading strategy,'' Mathematics, vol. 10, no. 4, pp. 566-578, 2022.

L. Zheng, T. Pan, J. Liu, G. Ming, M. Zhang, and J. Wang, ``Quantitative trading system based on machine learning in Chinese financial market,'' Journal of Intelligent & Fuzzy Systems, vol. 38, no. 2, pp. 1423-1433, 2020.

J. Yang, J. Li, and Q. Xu, ``A highly efficient big data mining algorithm based on stock market,'' International Journal of Grid and High Performance Computing (IJGHPC), vol. 10, no. 2, pp. 14-33, 2018.

M. Neifar, ``Suisse stock return, macro factors, and efficient market hypothesis: Evidence from ARDL model,'' Research in Business and Management, vol. 9, no. 1, pp. 21-42, 2022.

D. Tang, Z. Pan, and B. J. Bethel, ``Prediction of stock index of two-scale long short-term memory model based on multiscale nonlinear integration,'' Studies in Nonlinear Dynamics & Econometrics, vol. 26, no. 5, pp. 723-735, 2021.

J. Donaldson and A. M. Ingram, ``Applying multi-factor models of stock returns: student exercises and applications,'' Journal of Financial Education, vol. 40, no. 4, pp. 1-21, 2014.

W. Chen, Z. Hao, R. Cai, X. Zhang, Y. Hu, and M. Liu, ``Multiple-cause discovery combined with structure learning for high-dimensional discrete data and application to stock prediction,'' Soft Computing, vol. 20, no. 11, pp. 4575-4588, 2016.

H. Pan and M. Long, ``Intelligent portfolio theory and application in stock investment with multi-factor models and trend following trading strategies,'' Procedia Computer Science, vol. 18, no. 7, pp. 414-419, 2021.

B. I. Gabriel and A. A. Shakirat, ``Genetic algorithm model for stock management and control,'' International Journal of Strategic Decision Sciences (IJSDS), vol. 13, no. 1, pp. 1-20, 2022.

Y. Zhang, ``Research on stock market portfolio optimization using stochastic matrix theory and genetic algorithm,'' Mathematical Problems in Engineering, 2022.

R. Abraham, M. E. Samad, A. M. Bakhach, H. El-Chaarani, A. Sardouk, S. E. Nemar, and D. Jaber, ``Forecasting a stock trend using genetic algorithm and random forest,'' Journal of Risk and Financial Management, vol. 15, no. 5, pp. 188-203, 2022.

D. K. Sharma, H. S. Hota, K. Brown, and R. Handa, ``Integration of genetic algorithm with artificial neural network for stock market forecasting,'' International Journal of System Assurance Engineering and Management, vol. 13, no. suppl 2, pp. 1-14, 2021.

M. C. See, M. W. Chen, and S. H. Hao, ``Interpretable stock anomaly detection based on spatio-temporal relation networks with genetic algorithm,'' IEEE ACCESS, vol. 9, 2021.

E. M. Aloud, ``An intelligent stock trading decision support system using the genetic algorithm,'' International Journal of Decision Support System Technology (IJDSST), vol. 12, no. 4, pp. 36-50, 2020.

T. Wang, J. Guo, Y. Shan, Y. Zhang, B. Peng, and Z. Wu, ``A knowledge graph–GCN–community detection integrated model for large-scale stock price prediction,'' Applied Soft Computing Journal, vol. 145, 2023.

C. F. Lee, ``Financial econometrics, mathematics, statistics, and financial technology: An overall view,'' Review of Quantitative Finance and Accounting, vol. 54, no. 3, pp. 1529-1578, 2020

C. Zhao, X. H. Liu, J. Zhou, Y. F. Cen, and X. M. Yao, ``GCN-based stock relations analysis for stock market prediction,'' Peer Journal of Computer Science, vol.8, e1057, 2022.

H. Jun and W. Zheng, `` Multistage attention network for multivariate time series prediction, Neurocomputing, vol. 383, no. 1, pp. 122-137, 2020.

J. Liu, H. Lin, X. Liu, B. Xu, Y. Ren, Y. Diao, and L. Yang ``Transformer-based capsule network for stock movement prediction,'' Proc. of International Joint Conference on Artificial Intelligence, vol. 17, no. 9, pp. 43-47, 2019.

Author

Bo Zhang

Bo Zhang obtained his bachelor's degree in economics from Shanxi University of Finance and Economics in 2013. He obtained his master's degree in insurance from Zhejiang Gongshang University in 2015. Presently, he is working as a lecturer at the School of Accounting and Finance, Taizhou Vocational College of Science & Technology. His research interests include quantitative trading, asset allocation and portfolio management, and behavioral finance.

Meichen Tao

Meichen Tao obtained her bachelor's degree in accounting from Lanzhou University of Finance and Economics in 2014. She obtained her master's degree in international business from Capital University of Economics and Business in 2017. Presently, she is working as a lecturer at the School of Accounting and Finance, Taizhou Vocational College of Science & Technology. Her research interests mainly focus on financial management.

IEIE SPC IEIE Transactions on Smart Processing & Computing

Journal Search

Journal XML

Journal Information

Design and Implementation of a Multi-factor Intelligent Mining System for Stocks Based on GA-TGCN

Abstract

Keywords

1. Introduction

2. Theoretical Analysis and Current Status of Intelligent Mining Algorithms for Stocks

2.1 Quantitative Trading and Multi-factor Stock Selection Theory

(1)

(2)

2.2 Genetic Algorithm and Temporal Graph Convolutional Neural Network

2.2.1 Genetic Algorithm

(3)

(4)

2.2.2 Temporal Graph Convolutional Network

(5)

(6)

3. A Multi Factor Intelligent Mining System for Stocks Based on GA-TGCN Algorithm

3.1 Construction of Multifactor Intelligent Mining System for Stocks

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

3.2 Training of A Stock Multi Factor Intelligent Mining System

3.2.1 Architecture of Multi Factor Intelligent Mining System for Stocks Based on GA-TGCN

3.2.2 Combination Strategy of Multi Factor Intelligent Mining System for Stocks Based on GA-TGCN

(15)

4. Experimental Results and Analysis

4.1 Model Evaluation Indicators

(16)

(17)

(18)

4.2 Basic Data Acquisition and Analysis

4.3 Model Test Results

5. Conclusion

REFERENCES

Author

Bo Zhang

Meichen Tao

Article Information (continued)

Keywords

IEIE SPC

IEIE Transactions on Smart Processing & Computing