3.1 Construction of Multifactor Intelligent Mining System for Stocks
The focus of this model is on how to effectively graph the complex relationship between
multiple factors within a strategy and its return over time, and integrate it into
the factor screening model [29]. Due to the fact that the two exactly meet the input conditions to analyze the relationship
between predictive factors and strategy returns, this paper constructs an adjacency
matrix using the values of the factors themselves and their representative strategy
returns, and inputs them into the model prediction to predict the performance of each
factor in the next time period of the strategy. For a certain factor F, the characteristics
of its past T time periods can be described using formula (7).
The factor screening model includes two types of input information, including the
relationship diagram A of factors and the historical feature sequence of factor strategy
returns. Therefore, the problem is transformed into predicting $Y_{F,t}$, as expressed
in formula (8).
where $G$ function represents the GRU model, the $f$ function represents the GCN model,
$X$ represents the factor strategy return feature matrix $X$ input into the GCN model,
and $A$ represents the adjacency matrix of factor relationships.
The original data of the model includes the factors within the multi-factor strategy
and the annualized compound return of their corresponding combination strategies.
It is first passed into the module input layer, combined with the similarity matrix
constructed by the model, and then subjected to feature decomposition [30]. Subsequently, the graph convolution layer is used to perform the graph convolution
operation, aggregating features with high similarity, and ultimately generating a
matrix of the corresponding fundamental factors within the strategy corresponding
to the policy benefits. Building this screening model can not only consider the relationship
between different factors, but also take into account the changes in factor values
and even strategy returns over time. Ultimately, it can predict the appropriate value
of each factor within a multi-factor strategy.
The GCN module is used to capture the spatial relationship characteristics between
the returns of various factor strategies. The GCN module uses Chebyshev approximation
formula to obtain the correlation feature vectors of different dimensions of factors,
where $G$ represents the output of GCN, $\sigma$ represents the weight of model learning,
$A'$ obtained from the adjacency matrix $A$. In addition, the internal structure of
the graph convolution factor screening model constructed by the system is shown in
Fig. 4.
Fig. 4. Internal structure of graph convolution factor screening model.
For the internal structure of the graph convolution factor screening model, $GC$ represents
the graph convolution process, $h_{t-1}$ represents the state at $t-1$ time, and its
calculation process is shown in formula (9):
Among them, $\tanh$ is the activation function, $I$ is the feature matrix, $I+D^{-\frac{1}{2}}
LD^{-\frac{1}{2}}$ is the normalized adjacency matrix, $W$ is the weight matrix, and
$D$ is the degree matrix of $L$.
By integrating the integration of feature $F$ into the graph convolution process,
the numerical integration of feature $F$ can be obtained. The relevant calculation
process is shown in formula (10):
Among them, $\int_{t_{n-1}}^{t_n} F(t) dt$ represents the numerical integration of
feature $F$, $t_n$ and $t_{n-1}$ are two adjacent time points, and the right side
represents the approximate integration value of the trapezoidal integration rule.
It can be understood that the value of function $F(t)$ within the interval $[t_n$,
$t_{n-1}]$ at both ends is taken as the upper and lower integration limit, and the
trapezoidal area is taken as the approximate integration value, and replaced by the
characteristic value of the current time slice.
Based on the above algorithms, the output vector $G$ of GCN in $G=\sigma(A^{'} X \Delta)$
is used as the input part of the GRU module to further capture and organize the spatiotemporal
characteristics of the stock multi-factor strategy. The specific calculation steps
for the combination of GCN and GRU are shown in formula (11):
In the equation, $f(A,~X_t)$ represents the graph convolution process, and $h_{t-1}$
represents the state at time $t-1$.
$R_t$ represents the reset gate, responsible for combining the state at $h_{t-1}$
time with the graph convolution output at $t$ time. The specific algorithm process
is shown in formula (12):
Based on formulas (11) and (12), the hidden state ct can be obtained, and then the update gate $Z_t$ can be continued
backwards to determine how many states from the previous $h_{t-1}$ time are removed,
while also determining the amount of information after the update. Finally, all output
$h_t$ is concatenated to obtain feature vectors to express the relationship between
factors and strategy returns. The specific process is shown in formulas (13) and (14):