YANGHang1
WURen2
NAKATAMitsuru3
GEQi-Wei4
-
( The Graduate School of East Asian Studies, Yamaguchi University, 1677-1 Yoshida,
Yamaguchi-shi, 753-8514 Japan a505snu@yamaguchi-u.ac.jp)
-
( Faculty of Information Science, Shunan University, 843-4-2 Gakuendai, Shunan-shi,
745-8566 Japan renwu@shunan-u.ac.jp)
-
( Faculty of Education, Yamaguchi University, 1677-1 Yoshida, Yamaguchi-shi, 753-8513
Japan mnakata@yamaguchi-u.ac.jp)
-
( Yamaguchi University, 1677-1 Yoshida, Yamaguchi-shi, 753-8511 Japan gqw@yamaguchi-u.ac.jp)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Artificial intelligence, Machine learning, Acupuncture and moxibustion, Traditional Chinese medicine
1. Introduction
Acupuncture and moxibustion treatment (AMT for short hereafter) has the characteristics
of wide adaptability, remarkable curative effect, convenient application, low price,
and safe, and has been widely promoted in the world. It has been proved through clinical
practice that it has certain effects on more than 300 kinds of diseases in the fields
of internal medicine, surgery, gynecology and pediatrics, and has good effects on
about 100 kinds of diseases, such as chronic fatigue syndrome, withdrawal symptoms,
etc. [1]. Acupuncturists gather information about the patient's condition through the four
diagnostic methods of traditional Chinese medicine (TCM for short hereafter), namely
``inspection, audio-olfactory examination, interrogation, and palpation'', and provide
appropriate AMT prescriptions according to meridian theory and their own clinical
experiences. Over thousands of years, a vast amount of AMT clinical experience has
been recorded in the form of text. If these data can be systematically utilized, they
could provide substantial assistance for acupuncturists and even support for patients'
self-help treatment in the foreseeable future. However, the description of symptoms
in existing AMT books and clinical data is not unified, standardized, and further
difficult to be quantified, which poses challenges to the global promotion of AMT.
Artificial intelligence (AI for short hereafter) technology has flourished and has
been widely used in many fields in recent years. As one of the core technologies of
AI, machine learning has been applied to all walks of life including the medical field
[2]. Machine learning is to recognize input data sets through computer, encode the data
into computer models or algorithms, train appropriate mathematical models, and test
and verify the trained models on new data. The ability of machine learning to analyze
data related to medical treatment and results is expected to transform medicine into
a data-driven, results oriented discipline, which will have a profound impact on the
detection, diagnosis and treatment of diseases [3]. In acupuncture clinics, acupoints are selected on the premise of obtaining the four
diagnostic information of patients through the examination and palpation of doctors.
Existing medical technology can already assist doctors in obtaining this information,
such as pulse diagnostic devices to obtain pulse signals of patients and tongue scanners
to obtain the condition of patients' tongues. Nevertheless, the process from gathering
patient information through the four diagnostic methods to formulating the final acupoints
prescription still heavily relies on the acupuncturist's reasoning and experience.
This is exactly where machine learning can offer its significant advantages. Machine
learning can be used to generate acupoints, covering all successfully treated cases.
However, practical application faces challenges in data, medical validation, and ethics,
necessitating careful incremental development to ensure patient safety and treatment
effectiveness.
At present, the application of machine learning in the field of AMT is still in its
infancy. Researchers have applied many traditional machine learning algorithms and
deep learning algorithms to AMT, and achieved some research results [4]. Yang et al. [5] used artificial neural network to predict the clinical efficacy of AMT in the treatment
of depression based on the demographic characteristics of patients and the data of
the disease related self-assessment scale. Based on demographic data, four diagnostic
information of TCM, symptom evaluation scale and other parameters, Pei et al. [6] built a model through artificial neural network to predict the clinical efficacy
of acupuncture in treating heroin dependence. Hao et al. [7] used the fuzzy neural network to predict the changes of pain related biochemical
indicators enkephalin in patients receiving electroacupuncture treatment based on
physiological electrical signals such as ECG and EEG. Gan et al. [8] proposed an AMT support system from the data of four diagnostic methods of TCM to
AMT prescription.
Although the application of machine learning in the field of AMT is promising, the
further development of machine learning in the field of AMT is obstructed by such
reasons as the huge demand for data sample size of machine learning, the need for
a certain theoretical basis of TCM for AMT data processing, and the complexity of
this processing process. As far as we know, there is no research on providing acupoints
prescriptions that can cope with multiple diseases based on the data of four diagnostic
methods by machine learning methods. In this paper, a method to learn the text information
of existing AMT books and clinical data through AI algorithms is proposed to provide
acupoints prescriptions for treating patients. We extract the names of symptoms from
the texts of AMT, and unify and standardize them, so as to build a database of symptoms
and corresponding acupoints prescriptions. We test various algorithms to learn the
data in the database for train a model that can provide acupoints prescriptions based
on patients' symptoms and identify that an algorithm, Seq2seq with attention [9], performs significantly better than the other algorithms.
The paper is organized as follows. In Section 2, we introduce the basic principles
of AMT and the related introduction of machine learning. In Section 3, we introduce
our proposed method of deciding acupoints prescriptions through machine learning by
specifically describing the process of database construction, algorithm selection,
data preprocessing, and determining the evaluation criteria of the model. In Section
4, we select the best-performing algorithm, namely Seq2seq with attention, by 5-fold
cross validation, and further analyze it by ablation experiments. Finally, we conclude
the paper in Section 5.
2. AMT and Machine Learning
2.1 Basic Principles of AMT
The basic contents of AMT mainly include AMT theory, AMT technology and clinical application
of AMT. AMT theory mainly includes meridian theory and the rules of acupoints. AMT
technology mainly includes acupuncture, moxibustion and other needling methods. Clinical
application of AMT is a comprehensive application of AMT theory and technology. Meridian
theory is an important part of TCM, which covers the distribution, physiological functions,
pathological changes of human meridian system and its relationship with internal organs,
and runs through the diagnosis and treatment of AMT [1]. Meridians are the channels for human body to transport Qi and blood, which are all
over the body. Here Qi, as a special concept in TCM, is the most fundamental substance
in the construction of the human body and in the maintenance of its life activities.
Acupoints are special parts of human body surface that are infused with Qi from internal
organs and meridians, and also the places where acupuncture, moxibustion and other
stimuli are operated. Stimulating appropriate acupoints has the effect of dredging
meridians, harmonizing Qi and blood, restoring the balance of yin and yang, and coordinating
viscera, so as to achieve the purpose of disease prevention and treatment. There are
totally 409 acupoints identified by WHO (World Health Organization) [10]. The names and WHO notations of each meridian, as well as the acupoints, are shown
in Table 1. In addition, the parts that have neither specific name nor fixed position but are
stimulated by tenderness or other reactions are collectively referred to as ``Ashi
points''. Ashi points are usually near or far away from lesion, and they usually appear
with the occurrence of a disease and disappear with the recovery of a disease [1].
The decision-making process of traditional AMT is shown in Fig. 1. Acupuncturists obtain the symptoms of patients through the four diagnostic methods,
and analyze and summarize these symptoms according to meridian theory, so as to clarify
the etiology, location, pathogenesis and urgency of the disease. On this basis, the
appropriate AMT prescriptions are determined by comprehensively considering the meridian
theory and the rules of acupoints. AMT prescription includes acupoints prescription
and manipulation. Acupoints prescription is the first component of AMT prescription.
Each acupoint in the body has relative specificity and may have the same or different
therapeutic functions. Selecting acupoints with the same or similar functions can
enhance the effectiveness of the treatment by strengthening the synergistic effect
between the acupoints. Manipulation is the second component of AMT prescription, which
includes treatment methods, specific operation and timing of treatment [1]. In this paper, we focus our discussion on deciding acupoints prescription.
Fig. 1. The decision-making process of traditional AMT.
Table 1. Meridians and acupoints in human body.
Meridian name
|
WHO notation
of meridian
|
WHO notation
of acupoints
|
Number of acupoints
|
Lung Meridian
|
LU
|
LU1∼LU11
|
11
|
Large Intestine Meridian
|
LI
|
LI1∼LI20
|
20
|
Stomach Meridian
|
ST
|
ST1∼ST45
|
45
|
Spleen Meridian
|
SP
|
SP1∼SP21
|
21
|
Heart Meridian
|
HT
|
HT1∼HT9
|
9
|
Small Intestine Meridian
|
SI
|
SI1∼SI19
|
19
|
|
Bladder Meridian
|
BL
|
BL1∼BL67
|
67
|
Kidney Meridian
|
KI
|
KI1∼KI27
|
27
|
Pericardium Meridian
|
PC
|
PC1∼PC9
|
9
|
Triple Energizer Meridian
|
TE
|
TE1∼TE23
|
23
|
Gallbladder Meridian
|
GB
|
GB1∼GB44
|
44
|
Liver Meridian
|
LR
|
LR1∼LR14
|
14
|
Conception Vessel
|
CV
|
CV1∼CV24
|
24
|
Governor Vessel
|
GV
|
GV1∼GV28
|
28
|
Extras Points
|
EX
|
EX-B1∼EX-B9
EX-UE1∼EX-UE11
EX-LE1∼EX-LE12
|
48
|
Total
|
|
|
409
|
2.2 Machine learning
The process of selecting acupoints can be regarded as establishing the corresponding
relationship between the combination of a series of symptoms and the combination of
acupoints in the acupuncture treatment scheme, rather than the corresponding relationship
between a single symptom and a single acupoint. Since the number of combinations of
different symptoms and the number of combinations of acupoints in reality are very
large, it is impossible to cover all possible situations just by establishing a database
from existing books and clinical data. Machine learning has the ability of generalization
and can learn the rules hidden behind the data. The model trained by training data
can also provide appropriate output for data other than training data with the same
rule.
Machine learning is the study of how computers simulate human learning behavior to
obtain new knowledge or experience, and improve their own performance by restructuring
existing knowledge. Machine learning is widely used to solve classification, regression,
clustering and other problems because it can learn data rules and patterns in massive
data through computers and extract latent information. Maron et al. [11] proposed Naive Bayes (NB) algorithm for classification according to probability principle
based on Bayesian theory. Cover et al. [12] proposed KNN classification algorithm based on distance measurement. Breiman et al.
[13] proposed the early Decision Tree (DT) classification algorithm-cart algorithm, which
uses the tree structure algorithm to divide the data into discrete classes. Support
vector machine is a two-class classification model, which is a linear classifier with
the largest interval in feature space [14]. Its learning strategy is to maximize interval. Artificial Neural Network (ANN) classification
adjusts the parameters of artificial neural network according to the given training
samples to make the network output close to the known sample class label [15].
Deep learning is a special kind of machine learning, which is a new research direction
in the field of machine learning. Deep learning can be applied in various fields.
According to different applications, the form of deep neural network is also different.
Common deep learning models mainly include Fully Connected (FC) network, Convolutional
Neural Network (CNN) and Recurrent Neural Network (RNN). The fully connected network
structure is the basic deep neural network layer. Each node of the fully connected
layer is connected with all nodes of the upper layer. Because all the outputs and
inputs of the full connection layer are connected, the parameters of the full connection
layer are the most, which requires a considerable amount of storage and computing
space. CNN is a neural network used to process data with grid structure, which is
often used in the field of computer vision. The difference from FC is that the upper
and lower neurons of CNN cannot be connected directly, but through the ``convolution
core'' as the intermediary, and the parameters of the hidden layer are greatly reduced
through the sharing of ``core''. RNN is also one of the commonly used deep learning
models, which is a kind of neural network used to process sequence data. It is often
used in the field of natural language processing, and can also be used in the field
of computer vision.
3. Methodology
In this work, we are to train a model by learning relevant data through machine learning,
and provide acupoints prescriptions for treating patients by the trained model based
on patients' symptoms. For this purpose, database, algorithms, data preprocessing
and evaluation metrics are essential components.
1) Database: As far as we know, there is currently no database available for providing
acupoints prescriptions based on patients' symptoms, so we need to establish a database
of symptoms and acupoints prescriptions.
2) Algorithms: We regard the problem of acupoint selection as a multi-label classification
problem and use 11 algorithms for multi-label classification including 8 classifiers
provided by Scikit-learn and Feedforward neural network (FNN) [16], TextCNN [17], and Seq2seq with attention [9].
3) Data preprocessing: Different algorithms have different requirements for data representation,
so we have different preprocessing for data when using different algorithms.
4) valuation Metrics: The traditional evaluation criteria in multi-label classification
problems of accuracy, precision, and recall are not applicable to the evaluation of
the model in this study due to the characteristics of the AMT prescription data. We
take Intersection over Union (IoU) [18] as the
3.1 Database construction
The text information of AMT collected from books is preprocessed to extract the symptoms
used to judge diseases, and then these symptoms are standardized and unified. On this
basis, we build a database of symptoms and corresponding acupoints prescriptions.
Fig. 2 shows the construction process of disease cases in the database through ``chronic
fatigue syndrome with stagnation of liver-Qi'' as an example. The text in the green
box is the description text of the disease, and the words marked with green underline
are the corresponding symptoms. The words marked in the red box are used in TCM to
reflect one cause of the sample case, and the circled numbers, e.g., , show the corresponding symptoms. The text in the blue box is the acupoints prescription
of the disease case, and the words marked with blue underline are the corresponding
acupoints which are represented by acupoint numbers. One of the symptom names in the
figure, ``rib-side and abdominal distention and pain'', is expressed in a word in
TCM, but it actually consists of four different symptoms. Therefore, the symptom name
is divided into four symptom names, which is numbered as (148,149,158,159) in the
database. In addition, the word with the meaning ``or'' often appears in the text
describing the case to indicate that the same disease may have multiple symptoms,
which may or may not occur at the same time. For such case, we record it in the database
as different cases for the same disease name. In this case, although the symptom combination
of the disease is not completely the same, the corresponding acupoints prescriptions
are the same. In this way, we build our database of symptoms and corresponding acupoints
prescriptions.
Our database includes the data shown in Tables 2 and 3. Table 2 shows symptoms and acupoints with their related numbers. Table 3 shows disease case number, name of disease case, corresponding symptoms represented
by symptom number and corresponding acupoints represented by acupoint number. In Table 3, the symptoms and acupoints are represented by their numbers according to Table 2 and the orders of symptom numbers and acupoint numbers are respectively the same
as the symptoms and acupoints appear in texts of AMT prescription. One of the key
factors in applying text data of books and clinical data to actual AMT through machine
learning is symptom. Symptoms are the bridge between patient's condition and text
data. The patient's condition can be accurately described by giving the combination
of symptoms. At the same time, these symptoms can be diagnosed through various medical
technologies. Different from traditional AMT, our method requires the presence or
absence of various symptoms to accurately describe the patient's condition, so our
naming method for symptoms is not exactly the same as that of TCM. Luo [19] gave the classification basis of 399 TCM symptom names by analyzing the attributes
of symptoms. We plan to use these 399 symptom names as the symptom names in our database.
However, 399 symptoms are not enough to accurately describe some diseases. Hence we
have expanded the symptom names to 734 up to now. As to the name of acupoints, 409
acupoints certified by WHO and Ashi acupoint, totally 410 acupoint names are included
in our database. Currently we have collected 3000 diseases cases of acupoints prescriptions
from books [1,20,21]. We choose these three sources as our data materials because they are currently the
only ones we have managed to compile, and we prioritized them due to the credibility
of textbooks. In fact, additional high-quality data sources are essential before proceeding
with further clinical trials.
Fig. 2. Take ``chronic fatigue syndrome'' as an example to illustrate the construction
process of the database. Here the text of this figure is a translation of the Chinese
text on page 845 of reference [1].
3.2 Algorithm selection
We regard the problem of acupoint selection as the problem of determining which acupoints
are used for a series of symptoms. In machine learning, this kind of problem can be
classified as multi-label classification problem and belong to supervised learning.
The combination of symptom numbers is used as features of the model input, while the
combination of acupoint numbers is used as labels of the model output. Meanwhile,
it is obvious that the data type of this study is numeric. Based on the above observations,
this study considers 11 different algorithms as potential candidates. We divide these
11 algorithms into the following 4 categories.
$\bullet$ Traditional machine learning algorithms: Traditional machine learning algorithms
can be used to solve multi-label classification problems by employing problem transformation
techniques, including Binary Relevance, Classifier Chains, and Label Powerset [22]. We use 8 classifiers for multi-label classification provided by Scikit-learn, a
library in python that provides many unsupervised and supervised learning algorithms,
which includes Naive Bayes, Decision Tree classifier, Extra Tree classifier, Extra
Trees classifier, Kneighbors classifier, Random Forest classifier, Ridge classifierCV,
Neural network. At the same time, each classifier is experimented with the three problem
transformation techniques. Although these algorithms are commonly used to solve multi-label
classification problems, the experimental results are not satisfactory, and the specific
results are shown in Table 6 of the next chapter. Given these suboptimal results, we shift our focus to deep learning.
$\bullet$ FNN: FNN is a unidirectional multilayer network. Information is transmitted
from the input layer to one direction layer by layer until the end of the output layer.
FNN is composed of three parts, namely input layer, hidden layer and output layer.
The hidden layer can be one layer or multiple layers. FNN can handle more complex
nonlinear relationships in data and has stronger generalization than traditional machine
algorithms. We hope that these advantages can achieve better results than traditional
machine learning algorithms.
$\bullet$ TextCNN: CNN is usually used in the direction of computer vision (CV). Yoon
[17] made some modifications to the input layer of CNN and proposed a text classification
model TextCNN. The core of TextCNN is to convert words into word vectors through word2vec,
and then turns sentences into matrices composed of word vectors, which are calculated
through CNN. We hope to enhance the learning of the feature parts of the model through
the use of CNN, thereby improving the effectiveness of the model.
$\bullet$ Seq2seq with attention: Seq2seq is an algorithm of encoding-decoding structure,
originally proposed by Google, mainly for solving sequence-to-sequence tasks such
as machine translation and text summarization. However, this approach has also been
applied to multi-label classification problems, with previous studies [23,24,25] demonstrating the effectiveness of encoder-decoder models and achieving promising
results. Seq2seq has two modules, namely encoder and decoder. The encoder encodes
the input data and the decoder decodes the encoded data. Simple RNN, GRU, LSTM [26], etc. can be used inside the encoder and decoder. The addition of Attention mechanism
(hereinafter referred to as attention) can greatly improve the performance of Seq2seq.
The fundamental principle of ``Attention'' lies in simulating human information processing
by assigning different weights to different parts of the data, thereby selectively
focusing on key information. This mechanism calculates the relevance weights of input
parts, enabling the model to concentrate on crucial information and ignore irrelevant
details, thereby improving the accuracy and efficiency of handling complex tasks [27].
3.3 Data preprocessing
Different algorithms have different requirements for data representation, so we have
different preprocessing for data when using different algorithms. For the 8 classifiers
of traditional machine learning algorithms and FNN, the data can be preprocessed in
the same way. For TextCNN and Seq2seq with attention, we use different data preprocessing
methods separately.
For the 8 classifiers of traditional machine learning algorithms and FNN, numerical
data can be directly used as features and labels of machine learning. Therefore, we
encode the symptom of each case as a vector with a length of 734, the total number
of symptoms in our database. Each element of the vector corresponds to a symptom name
in the database. We encode the acupoints prescription of each case as a vector with
a length of 410. Each element of the vector corresponds to the acupoint name in the
database. As shown in Fig. 3, the left part of the table represents symptoms, ``0'' means there is no such symptom
in the disease case, and ``1'' means there is such symptom in the disease case. The
right part of the table shows the acupoints in the acupoints prescription corresponding
to the symptoms on the left part of the table. ``0'' means that the acupoint is not
stimulated, and ``1'' means that the acupoint is stimulated.
For TextCNN, the core of this algorithm is to convert words into word vectors through
word2vec, and then turns sentences into matrices composed of word vectors, which are
calculated through CNN. Therefore, if symptoms are coded as grid data, the information
of the data can be obtained through CNN. Since the number of symptoms in the database
is 734 at present, which is not very large, and there is no obvious relationship between
the symptoms, we do not need to use the word2vec used in TextCNN. As shown in Fig. 4, our approach is to expand the symptom vector of 734 into a vector of 756 in length
with ``0'', and then transform the vector into a matrix of $(27$, $28)$. In this way,
both one-dimensional convolution kernel or two-dimensional convolution kernel can
be used for operation.
For Seq2seq with attention, the embedding layer (which is already included in its
structure) handles the conversion of the discrete inputs into a continuous vector
representation, thus obviating the need to manually convert the symptom numbers and
acupoint numbers into vectors. Therefore, the symptom numbers and acupoint numbers
in the disease cases shown in Table 3 can be directly used as input features and corresponding labels for training the
model.
Fig. 3. Coding method of the traditional machine learning algorithms and FNN.
Fig. 4. Feature encoding of TextCNN.
3.4 Evaluation Metrics
There are 4 possible situations for the results of classification problems, as shown
in Table 4. In multi-label classification problems, the indicators commonly used to evaluate
models are accuracy, precision, recall, for which the calculation formulas are shown
in Eqs. (1)-(3) [22]. Accuracy measures the overall correctness of the model's predictions. Precision
measures the proportion of correctly predicted positive labels among all the predicted
positive labels. Recall measures the proportion of correctly predicted positive labels
among all the actual positive labels. However, the above three evaluation criteria
are not applicable to the evaluation criteria of the model in this study because of
the characteristics of AMT prescription data.
Table 4. Four possible situations for the results of classification problems.
1
|
True Positive (TP)
|
Number of positive classes predicted as positive classes (1→1)
|
2
|
True Negative (TN)
|
Number of negative classes predicted as negative classes (0→0)
|
3
|
False Positive (FP)
|
Number of negative classes predicted as positive classes (0→1)
|
4
|
False Negative (FN)
|
Number of positive classes predicted as negative classes (1→0)
|
Table 5. The evaluation results of an acupoints prescription with two acupoints using
different evaluation criteria.
Prediction results
|
Accuracy
|
Precision
|
Recall
|
IoU
|
A wrong acupoint
|
TP=0 TN=407
FP=1 FN=2
|
99.3%
|
0%
|
0%
|
0%
|
Two wrong acupoints
|
TP=0 TN=406
FP=2 FN=2
|
99.0%
|
0%
|
0%
|
0%
|
A correct acupoint and
a wrong acupoint
|
TP=1 TN=407
FP=1 FN=1
|
99.5%
|
50%
|
50%
|
33.3%
|
A correct acupoint and
two wrong acupoints
|
TP=1 TN=406
FP=2 FN=1
|
99.3%
|
33.3%
|
50%
|
25%
|
Two correct acupoints
and a wrong acupoint
|
TP=2 TN=407
FP=1 FN=0
|
99.8%
|
66.7%
|
100%
|
66.7%
|
Two correct acupoints
|
TP=2 TN=408
FP=0 FN=0
|
100%
|
100%
|
100%
|
100%
|
IoU [18], also known as the Jaccard Index, is a metric utilized to compare the similarities
and differences between finite sample sets. A higher IoU value signifies a higher
degree of similarity between samples. As an evaluation metric, IoU not only comprehensively
assesses the model's accuracy in pinpointing effective acupoints but also effectively
penalizes the incorrect prediction of non-effective acupoints as effective, thereby
effectively mitigating the challenges caused by imbalanced labels. The calculation
formula for IoU is shown in Eq. (4). Table 5 shows the results of different evaluation indicators in several possible situations
for predicting acupoint prescriptions with two acupoints. When the prediction result
is a wrong acupoint, the accuracy is still as high as 99.3%, which is obviously not
suitable for this problem. It can be seen from Table 5 that IoU is even more strict than other evaluation criteria, such as accuracy, precision
and recall. Hence, IoU is taken as the main evaluation criterion of the model in this
study.
Table 6. Experimental results of different machine learning algorithms.
Evaluating
Indicator
|
Naive
bayes
|
DecisionTree
Classifier
|
ExtraTree
Classifier
|
ExtraTrees
Classifier
|
Kneighbors
Classifier
|
RandomForest
Classifier
|
Ridge
ClassifierCV
|
Neural
network
|
FNN
|
TextCNN
|
Seq2seq with attention
|
Accuracy
|
84.00%
|
80.22%
|
67.11%
|
82.66%
|
73.33%
|
82.88%
|
78.90%
|
84.44%
|
99.82%
|
99.81%
|
99.91%
|
Precision
|
33.24%
|
36.06%
|
31.53%
|
36.29%
|
30.12%
|
33.10%
|
33.72%
|
33.61%
|
49.32%
|
43.50%
|
97.23%
|
Recall
|
32.66%
|
35.12%
|
31.3%
|
32.11%
|
23.78%
|
32.70%
|
29.74%
|
33.00%
|
49.08%
|
39.70%
|
96.83%
|
IoU
|
30.13%
|
33.19%
|
27.84%
|
31.77%
|
23.11%
|
29.78%
|
28.62%
|
30.17%
|
41.62%
|
38.80%
|
95.72%
|
4. Experimental results and analysis
In this section, we report the experimental results of 11 machine learning algorithms
on our database and identify the best performing algorithm Seq2seq with attention.
Additionally, we analyze the impact of factors such as the presence or absence of
attention and the order of data in the Seq2seq with attention algorithm on the results.
4.1 Comparative experiments
In order to reduce the uneven distribution of training data and test data and more
objectively evaluate the effects of the trained model, we use 5-fold cross validation
to verify the model, and conducts random processing before data segmentation. We take
90% of the data in the database are used for 5-fold cross validation to select the
optimal algorithm, and 10% of the data as test data to evaluate the performance of
the final model.
Table 6 shows the average results of 5-fold cross validation experiments for the 11 machine
learning algorithms, obtained by adjusting their respective hyperparameters to achieve
optimal performance. For Seq2Seq with attention, we set vocab_size to 735, wordvec_size
to 128, hidden_size to 256, and batch_size to 64. Other hyperparameter settings are
consistent with those documented in reference [27]. It can be seen from the results that the IoU of Seq2seq with attention algorithm
is the highest, that is, 95.72%, while the IoUs of all other algorithms are less than
41.62%.
Table 7 provides a detailed breakdown of the results and average values obtained from 5-fold
cross validation specifically for Seq2seq with attention. We retain the best model
in the training process as trained model, and evaluate the effectiveness of the trained
model through test data. We input the symptom numbers of 300 disease cases not participating
in the training model into the trained model, and then the trained model gives the
corresponding acupoint numbers. IoU of the test data is 95.33%.
Table 7. Experimental results of the 5-fold cross validation of Seq2seq with attention.
Evaluating
Indicator
|
First
|
Second
|
Third
|
Fourth
|
Fifth
|
Average
value
|
Accuracy
|
99.82%
|
99.96%
|
99.98%
|
99.97%
|
99.83%
|
99.91%
|
Precision
|
94.17%
|
98.69%
|
99.22%
|
99.02%
|
95.04%
|
97.23%
|
Recall
|
93.03%
|
98.83%
|
99.29%
|
98.81%
|
94.20%
|
96.83%
|
IoU
|
90.76%
|
98.30%
|
98.89%
|
98.50%
|
92.14%
|
95.72%
|
4.2 Further Experimental Analysis of Seq2seq with attention
To investigate the impact of the order of symptoms and acupoints numbers on the experimental
results, we conduct the experiment by arranging the symptoms and acupoints numbers
in ascending and descending order in the original data. Table 8 shows the results of 5-fold cross validation for both ascending and descending orders.
Comparison of Table 7 and 8 reveals that label ordering has little effect on experimental outcomes, which
is consistent with the findings reported in [23]. Specifically, the results suggest that if models are trained with appropriate regularization
techniques, the order of symptoms and acupoints numbers is not a major factor influencing
final performance.
Table 8. Experimental results of the 5-fold cross validation of Seq2seq with attention
on ascending and descending data.
Order of data
|
Evaluating
Indicator
|
First
|
Second
|
Third
|
Fourth
|
Fifth
|
Average
value
|
Ascending data
|
Accuracy
|
99.70%
|
99.93%
|
99.93%
|
99.96%
|
99.73%
|
99.85%
|
Precision
|
90.94%
|
98.06%
|
98.66%
|
99.17%
|
92.22%
|
95.81%
|
Recall
|
90.94%
|
97.94%
|
98.26%
|
98.70%
|
91.78%
|
95.52%
|
IoU
|
87.98%
|
97.20%
|
97.79%
|
98.39%
|
89.38%
|
94.15%
|
Descending data
|
Accuracy
|
99.68%
|
99.82%
|
99.98%
|
99.97%
|
99.83%
|
99.85%
|
Precision
|
90.70%
|
95.46%
|
99.22%
|
99.02%
|
95.04%
|
95.89%
|
Recall
|
89.91%
|
95.28%
|
99.29%
|
98.81%
|
94.20%
|
95.50%
|
IoU
|
86.73%
|
93.36%
|
98.89%
|
98.50%
|
92.14%
|
93.92%
|
In order to analyze the impact of attention on model performance, we conduct ablation
experiment. Table 9 shows the 5-fold cross validation results of Seq2seq without attention. Comparison
of Table 7 and Table 9 reveals that the IoU of Seq2seq without attention is significantly lower than Seq2seq
with attention. In addition, the IoU of the third and fourth folds are higher than
95%, while the IoU of the first, second, and fifth folds are all below 70%, indicating
that the model's generalization ability without attention is poor. The experimental
results show that the attention not only improves the IoU of the model, but also enhances
the robustness of the model.
Table 9. Experimental results of the 5-fold cross validation of Seq2seq without attention.
Evaluating
Indicator
|
First
|
Second
|
Third
|
Fourth
|
Fifth
|
Average
value
|
Accuracy
|
99.06%
|
99.12%
|
99.93%
|
99.95%
|
99.07%
|
99.42%
|
Precision
|
72.77%
|
75.34%
|
98.13%
|
98.45%
|
73.56%
|
83.65%
|
Recall
|
76.56%
|
80.21%
|
98.30%
|
98.54%
|
78.06%
|
86.34%
|
IoU
|
63.54%
|
67.34%
|
97.18%
|
97.93%
|
64.58%
|
78.11%
|
To evaluate the generalization and stability of the algorithm under increased data
volume, we conducted experiments on Seq2seq with attention using augmented data. For
the symptoms of a case, some of them are not ``and'' relationship, but ``or'' relationship.
Therefore, one or two secondary symptoms can be removed for cases with more than 10
symptoms. In this way, the number of disease cases can be increased and the influence
of the main symptoms in the model can be enhanced. We expanded 3000 cases to 6000
in this way. Table 10 provides a detailed breakdown of the results and average values obtained from 5-fold
cross validation specifically for Seq2seq with attention using 6000 augmented data.
Comparing Table 7 and Table 10 reveals a slight improvement in the model's IoU for the dataset of 6000 cases, indicating
that the algorithm has good scalability. It should be noted that AMT, as a medical
practice, demands exceptionally high data quality standards. The augmented data in
this paper is solely for testing the algorithm's scalability, not for training the
final model.
Table 10. Experimental results of the 5-fold cross validation of Seq2seq with attention
using 6000 augmented data.
Evaluating
Indicator
|
First
|
Second
|
Third
|
Fourth
|
Fifth
|
Average
value
|
Accuracy
|
99.85%
|
99.97%
|
99.98%
|
99.96%
|
99.86%
|
99.92%
|
Precision
|
95.60%
|
99.10%
|
99.23%
|
98.98%
|
96.04%
|
97.79%
|
Recall
|
95.75%
|
99.30%
|
99.26%
|
98.72%
|
96.21%
|
97.85%
|
IoU
|
93.93%
|
98.77%
|
98.92%
|
98.37%
|
94.84%
|
96.97%
|
4.3 Discussion
For traditional machine learning algorithms, we attribute their poor performance to
the following three reasons. (1) Traditional machine learning algorithms, such as
Decision Trees, Random Forests, and Naive Bayes methods, become ineffective in capturing
higher order correlations as they can only capture first or second order correlations
[25]; (2) For multi-label classification problems with excessive numbers of labels (e.g.,
410 labels in this study), transformation techniques that include Binary Relevance,
Classifier Chains, and Label Powerset are ineffective; (3) The imbalance of labels
in AMT data is also one of the important reasons for the poor performance of traditional
machine learning algorithms. It is difficult to ensure that the positive and negative
sample sizes of each label in the training data are consistent in AMT data. For FNN
and TextCNN, although higher-order correlations in features can be captured, their
generalization ability is limited due to ignoring the correlations between labels.
For Seq2seq with attention, the superior performance in deciding acupoints compared
to other algorithms can be attributed to 3 key advantages. Firstly, Seq2seq can capture
higher order relationships among features. Secondly, due to the encoding-decoding
structure of Seq2seq, it can accurately capture and predict the relationships between
multiple labels while learning high-order correlations of features, thereby enhancing
the model's generalization ability and overall performance. Furthermore, the addition
of attention further improves the performance and generalization ability of the model.
Based on the experimental results and the analysis mentioned above, Seq2seq with attention
is sufficiently applicable to determine acupoints.
4.4 Case Study
We showcase our model's practical performance through an example. We input a test
case's symptom combination into our model, and it provides an acupoints prescription,
as illustrated in Fig. 5. The model's prescription corresponds to the acupoints prescription described in
reference [21] for Yin-Yang deficiency hypertension. Despite this particular combination of symptoms
in the disease case not being included in the model's training data, the model provides
an accurate acupoints prescription, which shows that our model has good generalization.
Fig. 5. Exemplary acupoints prescription by Seq2seq with attention.
5. Conclusion
This paper proposed a methodology of applying machine learning to provide acupoints
prescriptions for treating patients based on symptoms. Firstly, a database of symptoms
and corresponding acupoints prescriptions is built by extracting the names of symptoms
from the texts of AMT and unifying and standardizing them. Secondly, 11 machine learning
algorithms are applied to learn the data in the database for training the related
model, and then the trained model is used to provide acupoints prescriptions for treating
patients. Computational experiments were done, in which 90% of the data in the database
of 3000 disease cases are used for 5-fold cross validation in order to select the
algorithm with the best performance, and 10% of the data are used as test data to
evaluate the generalization ability of the final model. From the experimental results,
finally we find Seq2seq with attention is the best among the 11 algorithms and sufficiently
applicable to determine acupoints.
As the future related works, we are to (1) further collect more data to expand the
application scope of our model; (2) add another part of AMT prescription besides acupoints
prescription, that is, manipulation, so as to provide more complete AMT prescription
for treating patients; (3) leverage the complementary strengths of TCM and Western
medicine to further enhance our model.
ACKNOWLEDGMENTS
This work was supported by JSPS KAKENHI Grant Number 20H04284 (Grant-in-Aid for
Scientific Research (B)) and JST SPRING, Grant Number JPMJSP2111.
REFERENCES
B. L. Zhang, X. M. Shi, and F. R. Liang, Theory and Practice of Acupuncture & Moxibustion
(in Chinese), China Press of Traditional Chinese Medicine, 2019.

R. C. Deo, ``Machine learning in medicine,'' Circulation, vol. 132, no. 20, pp. 1920-1930,
2015.

J. Goecks, V. Jalili, L. M. Heiser, and J. W. Gray, ``How machine learning will transform
biomedicine,'' Cell, vol. 181, no. 1, pp. 92-101, 2020.

J. Liang, M. Y. Ming, C. B. Wang, X. L. Lv, Z. R. Sun, and H. N. Yin, ``Research progress
in the integration of machine learning and the science of acupuncture and moxibustion,''
Acupuncture Research, vol. 46, no. 6, pp. 460-463, 2021.

X. Y. Yang, Y. Tu, and D. M. Duan, ``Application of curative effect prediction method
in acupuncture treatment of depression,'' Journal of Beijing University of traditional
Chinese Medicine (in Chinese), vol. 31, no. 5, pp. 355-357, 2008.

M. Fei and P. Xv, ``Estimation of the curative effects of acupuncture on heroin dependence
by neural networks,'' Lishizhen Medicine and Materia Medica Research (in Chinese),
vol. 19, no. 12, pp. 2974-2975, 2008.

W. S. Hao, X. S. Zhu, X. R. Wang, H. Y. Yang, Z. H. Wang, and Y. J. Zhang, ``Biochemical
index variation prediction during electroacupuncture analgesia using ANFIS method,''
Journal of Shanghai Jiaotong University (in Chinese), vol. 42, no. 2, pp. 177-180,
2008.

Q. Gan, R. Wu, M. Nakata, and Q. W. Ge, ``A proposal of support system for acupuncture
and moxibustion treatment in traditional Chinese medicine,'' IEICE Transactions on
Information and Systems, vol. 120, no. 245, pp. 40-43, 2020.

I. Sutskever, O. Vinyals, and Q. V. Le, ``Sequence to sequence learning with neural
networks,'' Advances in Neural Information Processing Systems, pp. 1-9, 2014.

A. Hyodo, Traditional Chinese Medicine Meridians and Acupoint Textbooks (in Japanese),
Shinsei Publishing, 2012.

M. E. Maron and J. L. Kuhns, ``On relevance, probabilistic indexing and information
retrieval,'' Journal of the ACM (JACM), vol. 7, no. 3, pp. 216-244, 1960.

T. Cover and P. Hart, ``Nearest neighbor pattern classification,'' IEEE Transactions
on Information Theory, vol. 13, no. 1, pp. 21-27, 1967.

L. Breiman and J. H. Friedman, Classification and Regression Trees, Routledge, 2017.

T. M. Cover, ``Geometrical and statistical properties of systems of linear inequalities
with applications in pattern recognition,'' IEEE Transactions on Electronic Computers,
vol. 14, no. 3, pp. 326-334, 1965.

W. S. McCulloch and W. Pitts, ``A logical calculus of the ideas immanent in nervous
activity,'' The Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115-133, 1943.

M. Frean, ``The upstart algorithm: A method for constructing and training feedforward
neural networks,'' Neural Computation, vol. 2, no. 2, pp. 198-209, 2014.

Y. Kim, ``Convolutional neural networks for sentence classification,'' Proc. of the
2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746-1751,
2014.

H. Rezatofighi, N. Tsoi, J. Y. Gwak, A. Sadeghian, I. Reid, and S. Savarese, ``Generalized
intersection over union: A metric and a loss for bounding box regression,'' Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658-666,
2019.

Z. Luo, The Classification Research of the Symptomatic Units of TCM (in Chinese),
M.S. Thesis, Shandong University of Traditional Chinese Medicine, 2012.

R. M. Yan, Essence of Yan Runming's 60 Years of Clinical Experience in Acupuncture
and Moxibustion (in Chinese), China Press of Traditional Chinese Medicine, 2013.

S. Z. Gao and J. Yang, Therapeutics of Acupuncture and Moxibustion (in Chinese), China
Press of Traditional Chinese Medicine, 2016.

M. L. Zhang and Z. H. Zhou, ``A review on multilabel learning algorithms,'' IEEE Transactions
on Knowledge and Data Engineering, vol. 26, no. 8, pp. 1819-1837, 2013.

J. Nam, E. L. Mencía, H. J. Kim, and J. Fürnkranz, ``Maximizing subset accuracy with
recurrent neural networks in multi-label classification,'' Advances in Neural Information
Processing Systems, vol. 30, pp. 5413-5423, 2017.

P. C. Yang, X. Sun, W. Li, S. Ma, W. Wu, and H. F. Wang, ``SGM: Sequence qeneration
model for multi-label classification,'' Proc. of the 27th International Conference
on Computational Linguistics, pp. 3915-3926, 2018.

W. Liao, Y. Wang, Y. Yin, X. Zhang, and P. Ma, ``Improved sequence generation model
for multilabel classification via CNN and initialized fully connection,'' Neurocomputing,
vol. 382, pp. 188-195, 2020.

K. Greff, R. K. Srivastava, J. Koutník, B. R. Steune- brink, and J. Schmidhuber, ``LSTM:
A search space Odyssey,'' IEEE Transactions on Neural Networks & Learning Systems,
vol. 28, no. 10, pp. 2222-2232, 2016.

K. Saito, Deep Learning from Scratch 2 (in Japanese), O'Reilly Japan, Inc., 2018.

Author
Hang Yang received his B.E. degree from Shanghai University of Engineering Science,
China, in 2014, and an M.E. degree from Zhejiang Sci-Tech University, China, in 2020.
He is currently a Ph.D. candidate at the Graduate School of East Asian Studies, Yamaguchi
University, Japan. His research interest includes artificial intelligence, system
modeling and modeling of acupuncture and moxibustion treatment in traditional Chinese
medicine.
Ren Wu received her B.E. and M.E. degrees from Hiroshima University, Japan, in
1988 and 1990, respectively, and a Ph.D. from Yamaguchi University, Japan, in 2013.
She was with Fujitsu Ten Ltd., West Japan Information Systems Co., Ltd. and Yamaguchi
Junior College from 1991 to March 2024. Since April 2024, she has been an Associate
Professor at Shunan University, Japan. Her research interest includes information
processing systems, linguistic information processing and system modeling. She is
a member of the Institute of Electronics, Information and Communication Engineers
(IEICE) and the Institute of Information Processing Society of Japan (IPSJ).
Mitsuru Nakata received his B.E., M.E., and Ph.D. degrees from Fukui University,
Japan, in 1992, 1994 and 1998, respectively. He was a Lecturer from 1998 to 2004 and
an Associate Professor from 2004 to 2014 both at Yamaguchi University, Japan. Since
October 2014, he has been a Professor at Yamaguchi University. His research interest
includes database system, text processing and program net theory and information education.
He is a member of the Institute of Electronics, Information and Communication Engineers
(IEICE), the Institute of Information Processing Society of Japan (IPSJ) and the Institute
of Electrical and Electronics Engineers (IEEE).
Qi-Wei Ge received his B.E. degree from Fudan University, China, in 1983, his M.E.
and Ph.D. degrees from Hiroshima University, Japan, in 1987 and 1991, respectively.
He was with Fujitsu Ten Limited from 1991 to 1993. He was an Associate Professor at
Yamaguchi University, Japan, from 1993 to 2004. Since April 2004, he has been a Professor
at Yamaguchi University, Japan. He is currently a Trustee at Yamaguchi University,
Japan. His research interest includes Petri nets, program net theory and combinatorics.
He is a member of the Institute of Electronics, Information and Communication Engineers
(IEICE), the Institute of Information Processing Society of Japan (IPSJ) and the Institute
of Electrical and Electronics Engineers (IEEE).