Mobile QR Code QR CODE

  1. (Publicity & United Front Work Department, Jilin Province Economic Management Cadre College, Changchun, 130012, China haiyingqi852@outlook.com)



Xi’an, Tourist attractions, Tourist perception, TF-IDF, NB, LDA

1. Introduction

In the context of the rapid development of Internet technology, more and more tourists in China have begun to share their travel routes and experiences in real time through the Internet. The original content of tourism users is an important part of tourism big data. These data can not only obtain the needs of tourists in depth, but also reflect the user’s preference for the destination, scenic spot demand, tourism motivation, tourist attributes, etc. [1,2]. Tourist perception refers to the emotion and cognition of tourists of a destination in the process of tourism, which has a direct impact on tourism satisfaction and tourism quality and also has an important impact on the sustainable development of the destination. Therefore, in the field of tourism, it is of great significance to explore users’ perception of an image of destination scenic spots [3,4].

Big data technology can effectively promote the intelligent development of destinations and the design of more targeted tourism products and marketing strategies to improve the management efficiency and tourism service quality of tourist attractions. Research in China on big data technology for visitor perception is mainly reflected in data mining, vector regression model, etc. However, there are some problems in the current research, such as the low prediction accuracy of the model. This study combines a variety of big data technologies to analyze the tourist perception of an image of tourist attractions to lay a foundation for the sustainable development of tourist destinations.

2. Related Work

An open, interoperable service-oriented architecture was adopted to maintain a proprietary system of transit flight planners and the arduous task of regularly updating transit information, as well as to realize the difficulty of using geospatial data and the rapid development of network technology. This framework is used in a transportation route planning system to re-examine modular resources. It integrates online geospatial services, open-source geospatial database technology, and pathfinding algorithms in a loosely coupled manner. It was found that this method makes the system more stable and can effectively use network technology [5].

Arif et al. used SPSS software to make statistics on questionnaires, online search logs, and chat records of 18 pairs of participants before and after a search to help travel websites search online information activities and a travel planning collaborative search system. Through analysis, it was found that the formulation of collaborative query, the division of search tasks, chat, and result sharing were important means of collaborative search for tourists [6]. Choi et al. investigated the number and type of source-related visual cues presented by online travel media. The model examines the relationship between online tourism information sources from the aspects of specialization, endorsement, and star rating of other users. Visual cues, cue-induced perception, information credibility and target images can reflect the technical functions. The experimental results show that tourists are closely related to source-related data [7].

In order to compare the comments of different online travelers, Hou et al. used semantic association analysis to extract keywords from the comments of the three major online travel agencies in China to build a semantic association network. The experimental analysis found that there were significant differences in the attribute platform, topic distribution, and community relationship of these structures. This study provided new insights for the development of new hotels, tourism, tourism companies, and online travel agencies [8]. Lin et al. analyzed the difference between the perceived value of first-time tourists and the perceived value of revisiting tourists. Tourism quality was the best measure of first-time tourists’ purchase intention. For revisiting tourists, perceived value was the best measure of revisit degree [9].

From the perspective of tourists’ perception, Souza and other scholars used a single factor evaluation model of tourism destination service quality to do a quantitative analysis of the perception of tourism services when domestic tourists travel to Xi’an. The research results show that the proposed evaluation model has high feasibility and reliability [10]. Moon and other researchers built a multidimensional measurement index system of the degree of satisfaction of consumers’ experience perception in travel agencies. The results show that consumers’ experience perception was good. This model can be applied to the experience perception evaluation of other tourist attractions [11].

Suhartanto and other scholars adopted data mining and data visualization methods by using the data of Tuniu. The differentiation of tourism products and the promotion strategy of tourism products were studied [12]. Samara and other researchers analyzed the average daily Baidu network attention data of rural tourism from 2011 to 2013 and obtained the characteristics of rural tourism attention in a year, week, month, season, golden week, and other time periods [13]. Alaei constructed a tourist sentiment analysis model by using artificial emotion discrimination rules and used this model to analyze the sentiment of Chinese tourists who post comments on domestic tourism websites to Australia. The model compares the differences between Chinese tourists and international tourists in Australia [14].

Ardito et al. used 70,859 user signs in data from 58 tourist attractions in Zhengzhou on Sina Weibo to build a scenic spot evaluation index system. The evaluation index system can be applied to the number of sign-ins. According to the user’s gender reflected in the sign-in data, the region where the user signed in, and the sign-in time, the preference of the user’s gender for the sign-in place can be obtained [15].

According to a large number of research results, tourist perception technology for an image of tourist attractions has made certain achievements in Chinese and international research. Data preprocessing technology, an emotion analysis method, a recommendation algorithm, a recommendation system related evaluation, and other aspects have made breakthroughs. However, in the field of tourist perception of an image of tourist attractions, few studies involve big data technology. This study conducted in-depth analysis of this topic to find problems in the tourist attractions and promotion strategies for a tourist image.

3. Tourist Perception of an Image of Xi’an Tourist Attractions Applying Big Data Technology

3.1 Data Source and Preprocessing of Xi’an Tourist Attractions

After preprocessing original review data, the final effective evaluation data was 2548 items. On this basis, a tourist perception model of a Xi’an scenic spot image was built using big data technology. The model first uses the term frequency – inverse document frequency (TF-IDF) algorithm to analyze the cognitive image of tourists and then uses the Naive Bayes (NB) method to analyze the emotional image of tourists. It finally uses the document generation model (LDA) theme model to analyze the overall image of scenic spots. Before constructing the tourist perception model of the image of Xi’an tourist attractions, the original data needs to be processed through preprocessing technology. Taking Xi’an tourist attractions as an example, a tourist perception model of destination terrain image was constructed based on a user’s review data in a tourism website. The selected user data are based on the real experience of users during the travel process, so the data are highly effective. The data source material was online evaluation data of Xi’an tourist attractions on Dianping.com and Ctrip. These two websites have a large number of users and a high rate of comments.

The collection method for online comment data was code written in Python 3.0. The data include the user number, number, comment content, comment time, and score. Due to the large scale of data, it takes a relatively long time to review it, and we cannot truly reflect the real situation of an image of tourist attractions, so the collection time was set to 2020 - 2022. A total of 5463 original tourism evaluation data were obtained. Table 1 shows some evaluation data of Xi’an tourist attractions.

Table 1. Partial evaluation data of Xi’an tourist attractions.

Website source

User No.

Evaluation content

Evaluation time

Tourist rating

Ctrip

M12****0563

The scenery is beautiful, the tour guide's explanation is also very meticulous and professional, and he has learned a lot of historical and cultural knowledge

2022-06-18

4

Ctrip

M18****6358

The special snacks in the scenic spot are delicious and cheap

2021-05-22

5

Dianping.com

Dpuser_3119058389VIP

The weather in Huashan is suitable, especially at sunrise and sunset. It deserves the title of the most dangerous mountain in the world

2022-08-23

4

Dianping.com

Dpuser_4251826625VIP

The Great Tang Never Night City is suitable for traveling at night without holidays with a strong historical and cultural atmosphere

2020-05-17

5

In the online review data, because different tourists have different opinions on the scenic spots, the evaluation contents can be summarized as a whole and multi-dimensional evaluation. At the same time, there will be great differences in the content and format of the evaluation, and the analysis of these evaluations will affect the whole research process and results. Therefore, before officially starting the analysis of evaluation data, the data need to be processed to eliminate the repeated comments, garbled code, the number of words, the length of comments is too short and other comment data to ensure the quality of evaluation data.

Fig. 1 shows the specific methods of pretreatment. The first is text de-duplication. The purpose of this operation is to remove duplicate places in the evaluation data or similar comments of the same user. When clearing the same evaluation content, df.drop_ Duplicates and df.duplicates functions can be used. The second method is compressing words and phrases. After text reprocessing, the quality of comment data cannot meet the requirements of modeling and analysis. Text de-duplication is only for the whole comment, not for the phrases and words in the comment. This study deals with words and phrases in a compressed way.

Fig. 1. Specific methods of pretreatment.
../../Resources/ieie/IEIESPC.2024.13.4.361/fig1.png

The third method is to delete numbers, symbolic expressions, English, and short sentences [16-18]. As the number of characters of tourists’ comments varies, the comments cover hundreds of characters, emoticons, different formats, etc. Although some of the short comments contain two or three words with experience, they cannot obtain in-depth information from the comments, so they need to be deleted. During the deletion of short sentences, the limit of the number of comments was set to 6 words. If the number is lower than this value, the comment will be deleted, and if it is higher than this value, the analysis will be retained.

The fourth method is to eliminate stop words. The stop words used in the study are from a list of the relevant comments on the scenic spot, the stop words list of the Harbin Institute of Technology, the stop words list of the Machine Intelligence Laboratory of Sichuan University, and the Baidu stop words list. The removal method was removing. Stopwords in Python Gensim. Fifth, Chinese word segmentation and part of speech analysis were done. In order to count the word frequency of each word and obtain the subject words and feature words in a comment, the comment content can be divided into valid words by using Chinese word segmentation. In view of the particularity of online reviews of scenic spots, this study added a customized dictionary related to scenic spots based on the reference of Tsinghua University Dictionary and Hownet Dictionary. The part of speech of the divided effective words was analyzed. The tool used for word segmentation processing was the Python jieba package.

3.2 Construction of Visitor Perception Model Applying Big Data Technology

After preprocessing with raw data, a visitor perception model that analyzes a valid comment set using the TF-IDF algorithm, NB network, and LDA topic model was constructed. The model uses the TF-IDF value of each word in the TF-IDF algorithm and ranks the top 50 feature words by the size of the TF-IDF value to obtain information about key topics frequently mentioned in a tourist evaluation. These key feature words obtained by the TF-IDF algorithm help to construct the cognitive image analysis dimension of tourists and help researchers understand the tourism-related things that tourists care about.

The model then uses the NB network to classify the tourists' evaluation text emotionally, obtaining the main emotions of the tourists about various tourism matters. Finally, the model uses the LDA theme model based on the results of the TF-IDF algorithm and NB network analysis to construct the relationship between emotional evaluation and tourism matters and conduct a thematic clustering analysis for an overall evaluation of tourist attractions. The TF-IDF algorithm is a numerical statistical method that is used as a weighting factor in the search process of user modeling, text mining, and information retrieval. The value of this factor is proportional to the number of words in comments [19,20]. TF has many expressions, including logarithmic scale types, Boolean types, primitive types, etc., which can be expressed by $f(t,d)$. IDF refers to a measure of the information provided by a word, which can be referred to as $idf(t,D)$. Expression (1) refers to the text set.

(1)
$ D=\left\{d_{1},\cdots ,d_{i},\cdots ,d_{N}\right\} $

The total number of text is $N$. $D$ refers to random variables in the text collection. $d$ is an element in $D$, and $i$ represents the i-th $d$.The word set in the text set is:

(2)
$ W=\left\{w_{1},\cdots ,w_{i},\cdots ,w_{M}\right\} $

$M$ refers to the total number of words, and $W$ refers to random variables in the text collection. Assuming that the probability $P(d_{i})$ of all elements in $D$ is equal the corresponding value is:

(3)
$ P\left(d_{i}\right)=\frac{1}{N} $

The amount of information calculated for each document is $-\lg \left(\frac{1}{N}\right)$, and the entropy of random variable $D$ is:

(4)
$ H\left(\Delta \right)=-\sum _{{d_{j}}\in D}P\left(d_{i}\right)\lg P\left(d_{i}\right) $

We set the number of documents containing the subset of $w_{i}$ to $N_{i}$. If the probability of obtaining each document is the same, the amount of information is $-\lg \left(\frac{1}{N_{i}}\right)$, and the entropy of random variable $D$ is:

(5)
$ H\left(\Delta w_{i}\right)=-\sum _{{d_{j}}\in D}P\left(d_{i}w_{i}\right)\lg P\left(d_{i}w_{i}\right) $

The probability of documents without $w_{i}$ in the selected subset is 0, and $N-N_{i}$ cannot appear in formula (5). If a word $w_{i}$ is arbitrarily obtained from the text, frequency $w_{i}$ in $d_{i}$ refers to $f_{ij}$. The frequency of $w_{i}$ in the whole text is $f_{{w_{i}}}$, and the total number of words in the text is $F$, and then the following holds.

(6)
$ \sum _{j}\frac{f_{ij}}{F}=\frac{f_{{w_{i}}}}{F} $

The interactive information value $M\left(\Delta ,\Omega \right)$ is:

(7)
$ \begin{array}{l} M\left(\Delta ,\Omega \right)=H\left(\Delta \right)-H\left(\Delta \left| \Omega \right.\right)\\ =\sum _{{w_{i}}}P\left(w_{i}\right)H\left(\Delta \right)=H\left(\Delta lw_{i}\right)\\ =\sum _{w_{i}}P\left(w_{i}\right)\cdot idf\left(w_{i}\right) \end{array} $

The calculation expression in the form of $f_{ij}$ can be obtained according to:

(8)
$ \begin{array}{l} M\left(\Delta ,\Omega \right)=H\left(\Delta \right)-H\left(\Delta \left| \Omega \right.\right)\\ =\sum _{{w_{i}}}P\left(w_{i}\right)H\left(\Delta \right)=H\left(\Delta lw_{i}\right)\\ =\sum _{{w_{i}}\in W}\sum _{{d_{j}}\in D}\frac{f_{ij}}{F}\lg \frac{N}{N_{i}} \end{array} $

The IDF factor refers to the change of information quantity after observing a specific word, and the TF factor refers to the probability estimate of actually observing a word. Eqs. (7) and (8) refer to two different aspects. When TF refers to $f_{{w_{i}}}$, TF-IDF refers to the measurement of word selection. When TF refers to $f_{ij}$, TF-IDF refers to the measure of word weight [21-23].

A NB network is a probability distribution among a group of random variables, which can be divided into a static NB network and dynamic NB network. The difference is that the dynamic NB network considers the impact of time factors on the results. An NB network can be referred to by $G=\left(I,L\right)$. $L$ refers to the collection of segments connecting nodes, and $I$ refers to the collection of all nodes in the network structure. NB network can be divided into two parts, which are variable nodes and directed segments between nodes. The line segment is a conditional probability value. If the two nodes are not connected with each other, the random variables can be considered to be independent of each other, and the conditional probability value is 0.

We set the directed acyclic network diagram as $S$, and the joint probability distribution of variable $X=\left\{x_{1},x_{2},\cdots ,x_{n}\right\}$ as $P\left(x_{1},x_{2},\cdots x_{n}\right)$:

(9)
$ P\left(x_{1},x_{2},\cdots x_{n}\right)=\prod _{i=1}^{n}P\left(x_{i}\left| P_{ai}\right.\right) $

In Eq. (9), $P_{ai}$ refers to the parent node of the variable. The calculation expression of joint probability coding of variable $X=\left\{x_{1},x_{2},\cdots ,x_{n}\right\}$ is Eq. (10).

(10)
$ P\left(x\left| \theta _{s},S^{b}\right.\right)=\prod _{i=1}^{n}p\left(x_{i}\left| p_{ai},\theta _{i},S^{b}\right.\right) $

In Eq. (10), $\theta _{i}$ refers to the parameter variable. The vector formed by the parameter set is referred to by $\theta _{s}$. The joint probability distribution obtained from the decomposition of $S$ is $S^{b}$. The calculation expression of local distribution function is (11).

(11)
$ P\left(x\left| \theta _{s},S^{h}\right.\right)=\prod _{i=1}^{n}p\left(x_{i}\left| p_{ai},\theta _{i},S^{h}\right.\right) $

Eq. (11) can be understood as a continuous variable regression function and discrete variable regression function. The construction of the NB network model is as follows. We determine the properties of the node variables, set the value range, and determine the conditional probability of the directed segment between the nodes. From the perspective of the reasoning direction of the NB network structure diagram, conditional probability can be divided into a prior probability and a posterior probability. A prior probability is obtained from background knowledge and historical data. The posterior probability is calculated on the basis of the prior probability. The two probabilities have the same form. We set $w_{1},\cdots w_{i}\cdots w_{n}$ as the weight of all categories, and the NB network equation is:

(12)
$ P\left(w_{i}\left| x\right.\right)=P\left(x\left| w_{i}\right.\right)\ast \frac{P\left(w_{i}\right)}{P\left(x\right)} $

In Eq. (12), $P\left(x\left| w_{i}\right.\right)$ refers to the likelihood function of category $w_{i}$ with respect to feature vector $x$. $P\left(w_{i}\right)$ refers to the prediction of the probability of occurrence of various categories. $P\left(x\left| w_{i}\right.\right)$ refers to the probability of occurrence of feature vector $x$ in category $w_{i}$. $P\left(w_{i}\left| x\right.\right)$ is the posterior probability. $P\left(x\right)$ refers to the total probability of conditional probability.

A flow chart of NB network modeling is shown in Fig. 2. The key step is to determine the conditional probability and causal relationship without knowledge based on the database and expert knowledge. The model determines the relationship between various variables by learning the NB network structure [24,25].

Fig. 2. NB network modeling flow chart.
../../Resources/ieie/IEIESPC.2024.13.4.361/fig2.png
Fig. 3. LDA theme model.
../../Resources/ieie/IEIESPC.2024.13.4.361/fig3.png

The LDA topic model is a three-layer Bayesian probability model, which includes a three-layer structure for the text, topic, and word. The graph model is shown in Fig. 3. White circles and orange circles refer to hidden variables and observed variables, respectively. $\alpha $ and ${\beta}$ refer to the hyperparameters of topic distribution and term distribution, respectively. $\overset{\rightarrow }{\theta }_{d}$ and $\overset{\rightarrow }{\varphi }_{k}$ refer to the subject distribution of the text and the word distribution under the subject, respectively. $z_{d,n}$ and $w_{d,n}$ refer to the subject of the $d$-th word item and the $d$-th word item in the text. The number of topics is $K$, and the total number of words in the text of $d$ is $N_{d}$. Eq. (13) refers to the topic distribution of each text based on probability.

(13)
$ \overset{\rightarrow }{\theta }_{d}\sim Dirichlet\left(\overset{\rightarrow }{a}\right) $

Eq. (14) refers to the term distribution of each topic $z\in \left\{1,2,\cdots ,K\right\}$ based on probability.

(14)
$\overset{\rightarrow }{\varphi }_{k}\sim Dirichlet(\overset{\rightarrow }{\beta })$

The joint probability of the implicit variable and the observed variable under the given parameters is:

(15)
$ p\left(\overset{\rightarrow }{w}_{d},\overset{\rightarrow }{z}_{d},\overset{\rightarrow }{\theta }_{d},\Phi \left| \overset{\rightarrow }{\alpha },\overset{\rightarrow }{\beta }\right.\right)=\prod _{n=1}^{N_{d}}p\left(w_{d,n}\left| \varphi _{zd,n}\right.\right) $

In Eq. (15), $\Phi $ refers to an integral.

LDA is used to identify the topic information implied in a large text set or a large corpus. For all documents in this corpus, LDA has the following generation process. First, it extracts a topic from the topics distributed in the document. It extracts another word from the corresponding word distribution in the selected topic. It then repeats the process in a loop until it traverses all the words in the document. The LDA topic model can automatically identify the topic of the document.

The Gibbs sampling algorithm is easier to understand, and its implementation is not very complex. Especially when the subject is extracted from a large number of samples, the extraction effect is relatively significant. Therefore, the Gibbs sampling algorithm can be used to estimate the parameters of LDA subject model. Using the LDA topic model, we can calculate the topic probability of positive emotional text and negative emotional text. At the same time, the distribution probability topic vector of words contained in this topic is obtained, and finally, the clustering result of this topic is obtained. The LDA thematic clustering results are refined, and the overall image perception of Xi’an tourist attractions is summarized.

When determining the number of topics in a document set, the selection of the number of topics greatly affects the effect of topic modeling. Therefore, it is necessary to determine the optimal number of topics before formally establishing the LDA topic model. This study selected the degree of confusion as an indicator to select the optimal number of topics. When the degree of confusion is lower, the number of topics is the best. The Gibbs sampling method was used to calculate the puzzle degree of the number of topics between 2 and 40, and the relationship between the complexity and the number of topics was drawn.

4. The Tourist Perception Results of an Image of Xi’an Tourist Attractions under the Big Data Technology

In an experiment, the TF-IDF value of tourism evaluation was calculated and ranked, the performance of NB network was compared, and the emotional image of tourists was analyzed. Finally, the LDA theme model was used to analyze the overall image of a scenic spot. The algorithms compared with the NB network were a text classification model (IDL) of an integrated deep learning framework [26], a recursive neural network gating recurrence unit (RNN-GRU) [27], and CNN model of word embedding (WE-CNN) [28]. The environment of the experiment was Windows 10 and Python, and the model was implemented on the Tensor Flow platform.

4.1 Analysis of Travel Feature Values based on the TF-IDF Algorithm

The TF-IDF algorithm was used to obtain the TF-IDF value of each word in the document. At the same time, the top 24 feature words were obtained according to the size of the value, and the specific results are shown in Table 2. The range of the TF-IDF value is 0.0245-0.2316, and the maximum value and minimum value correspond to service attitude and category, respectively.

Table 2. TF-IDF value of some words in a document.

Characteristic word

TF-IDF value

Characteristic word

TF-IDF value

Service attitude 1

0.2316

Hotel 13

0.0389

City wall 2

0.1956

Transportation 14

0.0345

Scenic Spot 3

0.1232

Shock 15

0.0316

Terra Cotta Warriors 4

0.1023

Worth 16

0.0287

Spectacular 5

0.0896

Convenience 17

0.0268

Not bad 6

0.0765

Guide 18

0.0254

Big Wild Goose Pagoda 7

0.0658

Recommendation 19

0.0247

Huashan 8

0.0635

Queue 20

0.0236

Spectacular 9

0.0627

Price 21

0.0231

Sightseeing bus 10

0.0534

Fare 22

0.0226

Ladder 11

0.0527

Charge 23

0.0218

Hotel 12

0.0408

Bell Tower 24

0.0215

According to the effective dataset of the review, cognitive images can be divided into four categories: tourism attractions, service measures, tourism management, and tourism services. Fig. 4 shows some characteristic words of tourist attractions. Among the top 20 feature words of tourist attractions, the TF-IDF value of feature words such as scenic spots is higher. Therefore, tourists have a higher degree of cognition and perception of the city wall, the Terra Cotta Warriors, the Big Wild Goose Pagoda, Huashan Mountain, and other scenic spots. At the same time, the TF-IDF values of such characteristic words as spectacular, good, shocking, and worthy are also high, which indicates that tourists have a positive perception of the tourist attractions of the scenic spot. Fig. 4(b) shows some characteristic values of the cognitive image of tourism service facilities.

Fig. 4. TF-IDF value of four kinds of tourists’ cognitive image.
../../Resources/ieie/IEIESPC.2024.13.4.361/fig4.png

The TF-IDF value of tourist vehicles, escalators, hotels, hotels, and other transportation facilities is high. Fig. 4(c) shows some characteristic values of a tourism management cognitive image. The TF-IDF value of management characteristic words such as queue, price, ticket price, and charge is high. The TF-IDF value of management features is relatively high, which shows that tourists pay more attention to the management of ticket charges and order management. Due to the higher TF-IDF value of "ticket," it can be seen that whether the ticket price of the scenic spot is reasonable affects the tourists' perception of the scenic spot to a large extent.

Fig. 4(d) shows some characteristic values of a tourism service cognitive image. The TF-IDF values of service characteristics such as service attitude, convenience, and tour guide are higher. The TF-IDF value of service attitude is as high as 0.2316, from which it can be seen that service attitude has a great impact on the tourism experience in the minds of tourists. Based on the analysis, compared with the hardware conditions such as scenic spots and service measures of tourist attractions, tourists pay more attention to the service attitude and level of tour guide service and believe that the service quality affects the tourism experience to a greater extent.

4.2 NB Model Performance Test and Tourism Emotional Image Perception Results

Fig. 5 shows the emotional image of tourism attractions analyzed by the NB model. The emotional image perception of Xi’an tourist attractions is mainly positive. The proportions of positive evaluation and negative evaluation were 98.56% and 1.44%, respectively. Fig. 5(b) shows the emotional image of tourism service facilities. It is consistent with the perception results of the emotional image of tourist attractions, with the proportions of positive and negative evaluations being 92.56% and 7.44%, respectively. Figs. 5(c) and (d) show the emotional image of tourism management and the emotional image of service, respectively, and the perception is mainly positive.

Fig. 5. Emotional image results of NB model analysis.
../../Resources/ieie/IEIESPC.2024.13.4.361/fig5.png

The performance comparison of the NB model used is shown in Fig. 6. The maximum number of iterations was set to 2000, and the performance was evaluated by running time, loss value, recall rate, and accuracy rate. Fig. 5 shows the performance of the NB model under different training times. As the number of iterations increases, the running time of the model increases gradually. When the maximum number of training times is reached, the running time also reaches the maximum value, which is 225.78 s. It can be seen from the figure that when the number of iterations is about 140, the recall rate and accuracy rate of the test set are ideal. The corresponding values are 0.78, 0.79, and 0.85, and the loss value of the model gradually converges to 0.05. This shows that the NB model is reliable and effective.

Fig. 6. Performance comparison of NB model.
../../Resources/ieie/IEIESPC.2024.13.4.361/fig6.png

The performance of the NB model was studied by comparing the text classification model (ID) [26], the recursive neural network gating recursive unit (RNN-GRU) [27], and the CNN model (WE-CNN) [28] to the NB model. Fig. 7 shows the accuracy of different algorithms in negative evaluation and positive evaluation. It can be seen from the figure that the NB model has high accuracy for both negative and positive evaluations under different data scales. The corresponding maximum values are 0.83 and 0.85, respectively. The accuracy of other algorithms ranges from 0.60 to 0.80.

Fig. 7. Accuracy of different algorithms in negative evaluation and positive evaluation.
../../Resources/ieie/IEIESPC.2024.13.4.361/fig7.png

Fig. 8 shows the accuracy of different algorithms in negative evaluation and positive evaluation. It can be seen from the figure that the NB model has high accuracy for both negative and positive evaluations under different data scales. The corresponding maximum values are 0.82 and 0.85, respectively. The accuracy of other algorithms ranges from 0.60 to 0.80.

Fig. 8. Accuracy of different algorithms in negative evaluation and positive evaluation.
../../Resources/ieie/IEIESPC.2024.13.4.361/fig8.png

Fig. 9 shows the running time of different algorithms in negative evaluation and positive evaluation. For negative evaluation and positive evaluation, the NB model has a long running time under different data scales. The corresponding maximum values are 8.1 s and 7.9 s, respectively. The algorithm with the shortest running time under the same data size is NN, followed by IDL and RNN-GRU.

Fig. 9. Running time of different algorithms in negative evaluation and positive evaluation.
../../Resources/ieie/IEIESPC.2024.13.4.361/fig9.png

4.3 Results of the Overall Image Analysis of Xi'an Scenic Spot based on the LDA Theme Model

Fig. 10 shows the correlation between the degree of confusion of positive and negative emotional texts and the number of topics. When the number of topics is 4, the confusion degrees of positive emotional text and negative emotional text are the lowest. Therefore, this value was selected as the best number of topics. For both positive and negative emotional texts, the high-frequency words in theme 4 mainly focus on service facilities, such as sightseeing buses, escalators, hotels, etc. Based on the analysis, the main problems of Xi’an tourist attractions are high ticket prices, weak humanization of infrastructure, and low service quality.

Fig. 10. The correlation between the degree of confusion of positive and negative emotional texts and the number of topics.
../../Resources/ieie/IEIESPC.2024.13.4.361/fig10.png

5. Conclusion

In view of the problem that potential customers do not have a comprehensive understanding of tourist attractions, this study used big data technology to realize the tourist perception of an image of Xi’an tourist attractions. This included cognitive image and emotional image perception. For four cognitive images of tourism attractions, service measures, tourism management, and tourism services, the TF-IDF value of tourist vehicles, escalators, hotels, hotels, and other transportation facilities was high. The TF-IDF value of management characteristic words such as queue, price, ticket price, and charge was high. The TF-IDF values of service characteristics such as service attitude, convenience and tour guides were higher. For the four emotional images, the perception was mainly positive. When the number of iterations was about 140, the recall rate and accuracy rate of the test set were ideal. The corresponding values were 0.78, 0.79, and 0.85, and the loss value of the model gradually converged to 0.05.

For negative evaluation and positive evaluation, the NB model has high accuracy under different data scales, with corresponding maximum values of 0.83 and 0.85, respectively. This perception model can analyze the problems of confusion in management and high-ticket prices in the development of tourist attractions. At the same time, it can also give relevant optimization suggestions. However, there are still deficiencies in the study. The selected online evaluation data do not comprehensively analyze the attributes of tourists and relevant pictures. The selected online evaluation data type was relatively single, and only the text evaluation of tourists was analyzed. The pictures, expressions, and other information in the evaluation were not analyzed, the attributes of tourists were not comprehensively analyzed, and the tourists were divided into novice tourists and long-term tourists for analysis. In future work, a more comprehensive study will be carried out and will try to analyze the image type and expression type of tourists with different attributes.

REFERENCES

1 
Z. Xiao, Y. Zhao, N. Li, S. Zhou, Xu, H. Research on key technologies of hand function rehabilitation training evaluation system based on leap motion. Journal of Computer and Communications, 9(1):19-35, 2021.DOI
2 
D. Pesce, P. Neirotti, E. Paolucci. When culture meets digital platforms: value creation and stakeholders’ alignment in big data use. Current Issues in Tourism, 22(15), 1883-1903, 2019.DOI
3 
J. L. Jimenez-Marquez, I. Gonzalez-Carrasco, J. L. Lopez-Cuadrado, B. Ruiz-Mezcua. Towards a big data framework for analyzing social media content. International Journal of Information Management, 2019, 44, 1-12.DOI
4 
K. K. Ranga, & C. K. Nagpal. A big data analytics framework for determining the travel destination preferences of Indian tourists. International Journal of Modern Physics C (IJMPC), 34(2), 1-14, 2023.DOI
5 
D. Jian, Sun, et al. Development of Web-Based Transit Trip-Planning System Based on Service-Oriented Architecture. Transportation Research Record, 2217(1):87-94, 2018.DOI
6 
A. S. M. Arif, J. Du. Understanding collaborative tourism information searching to support online travel planning. Online Information Review, 43(3):369-386, 2019.DOI
7 
Y. Choi, B. Hickerson, D. Kerstetter. Understanding the Sources of Online Travel Information. Journal of travel research, 57(1):116-128, 2018.DOI
8 
Z. Hou, F. Cui, Y. Meng, T. Lian, & C. Yu. Opinion mining from online travel reviews: A comparative analysis of Chinese major OTAs using semantic association analysis. Tourism Management, 74, 276-289, 2019.DOI
9 
H. Lin, M. Zhang, D. Gursoy, & X. Fu. Impact of tourist-to-tourist interaction on tourism experience: The mediating role of cohesion and intimacy. Annals of Tourism Research, 76, 153-167, 2019.DOI
10 
L. H. Souza, E. Kastenholz, M. D. L. A. Barbosa, & M. S. E. S. C. Carvalho. Tourist experience, perceived authenticity, place attachment and loyalty when staying in a peer-to-peer accommodation. International Journal of Tourism Cities, 6(1), 27-52, 2020.DOI
11 
H. Moon, H. Han. Tourist experience quality and loyalty to an island destination: The moderating impact of destination image. Journal of Travel & Tourism Marketing, 36(1), 43-59, 2019.DOI
12 
D. Suhartanto, A. Brien, I. Primiana, N. Wibisono, N. N. Triyuni, Tourist loyalty in creative tourism: the role of experience quality, value, satisfaction, and motivation. Current Issues in Tourism, 23(7), 867-879, 2020.DOI
13 
D. Samara, I. Magnisalis, V. Peristeras, Artificial intelligence and big data in tourism: a systematic literature review. Journal of Hospitality and Tourism Technology, 11(2), 343-367, 2020.URL
14 
A. R. Alaei, S. Becken, & B. Stantic. Sentiment analysis in tourism: capitalizing on big data. Journal of travel research, 58(2), 175-191, 2019.DOI
15 
L. Ardito, R. Cerchione, P. Del Vecchio, & E. Raguseo. Big data in smart tourism: challenges, issues and opportunities. Current Issues in Tourism, 22(15), 1805-1809, 2019.DOI
16 
A. Yallop, H. Seraphin. Big data and analytics in tourism and hospitality: opportunities and risks. Journal of Tourism Futures, 6(3), 257-262, 2020.DOI
17 
H. Siam, M B. Younes An efficient multi-destinations trip planning protocol for intelligent transport system. International journal of numerical modelling, 32(3):2548-2549, 2019.DOI
18 
W. Höpken, T. Eberle, M. Fuchs, & M. Lexhagen. Improving tourist arrival prediction: a big data and artificial neural network approach. Journal of Travel Research, 60(5), 998-1017, 2021.DOI
19 
K. Al Fararni, F. Nafis, B. Aghoutane, Yahyaouy, A., Riffi, J., & Sabri, A. (2021). Hybrid recommender system for tourism based on big data and AI: A conceptual framework. Big Data Mining and Analytics, 4(1), 47-55.DOI
20 
Choi, Youngjoon, Hickerson,et al. Understanding the Sources of Online Travel Information. Journal of travel research:The International Association of Travel Research and Marketing Professionals, 2018, 57(1):116-128.DOI
21 
Y. Kim, C. K. Kim, D. K. Lee, H. W. Lee, & R. I. T. Andrada. Quantifying nature-based tourism in protected areas in developing countries by using social big data. Tourism Management, 72, 249-256, 2019.DOI
22 
G. C. Ogbeide, Y. Y. Fu, & A. K. Cecil, Are hospitality/tourism curricula ready for big data? Journal of Hospitality and Tourism Technology, 12(1), 112-123, 2021.DOI
23 
P. Centobelli, & V. Ndou, Managing customer knowledge through the use of big data analytics in tourism research. Current Issues in Tourism, 22(15), 1862-1882, 2019.DOI
24 
Guizzardi, A., Pons, F. M. E., Angelini, G., & Ranieri, E. Big data from dynamic pricing: A smart approach to tourism demand forecasting. International Journal of Forecasting, 37(3), 1049-1060, 2021.DOI
25 
M. A. Köseoglu, F. Mehraliyev, M. Altin, & F. Okumus, Competitor intelligence and analysis (CIA) model and online reviews: integrating big data text mining with network analysis for strategic analysis. Tourism Review, 76(3), 529-552, 2021.DOI
26 
Mohammed A., Kora R. (2023). An effective ensemble deep learning framework for text classification. Journal of King Saud University-Computer and Information Sciences, 34(10): 8825-8837.DOI
27 
Alzanin S. M., Azmi A. M., & Aboalsamh H. A. (2022). Short text classification for Arabic social media tweets. Journal of King Saud University-Computer and Information Sciences, 34(9): 6595-6604.DOI
28 
Umer M., Imtiaz Z., & Ahmad M. (2023). Impact of convolutional neural network and FastText embedding on text classification. Multimedia Tools and Applications, 82(4): 5569-5585.DOI

Author

Haiying Qi
../../Resources/ieie/IEIESPC.2024.13.4.361/au1.png

Haiying Qi graduated from the Department of Journalism, School of Literature, Northeast Normal Univer-sity, majoring in Communications in 2009. Currently, she is working at Jilin Economic Management Cadre College in Changchun, Jilin Province in Northeast China, serving as the Deputy Minister of the Publicity & United Front Work Department and Associate Professor. She has guided students to participate in provincial vocational skills competitions for many times and won the first prize. She has served as a judge for oral examination of the national tour guide qualification certificate, published more than 20 provincial papers, edited three national 12th and 13th Five-Year Plan textbooks, and participated in the editing of many textbooks.