3.1 Model Construction
                  Because a CNN has characteristics of automatically extracting images, it can automatically
                     learn and train data by imitating brain tissue, and can realize tasks such as image
                     classification. The CNN has been widely used in image processing. The basic components
                     of the CNN are the data input layer, the convolution calculation layer, the excitation
                     layer, the pooling layer, the fully connected layer, and the output layer [16]. The convolutional layer is the core of a CNN. The function of the convolutional
                     layer is to extract the features of the image and strengthen the expressive ability
                     of the network after a convolution operation [17]. The calculation of the convolutional layer is divided into two steps; the first
                     is to capture image position information, and then, feature extraction is performed
                     on the captured image. After the first convolution operation is the calculation formula
                     of the image change as seen in (1):
                  
                  
                  In (1), $H_{out}$ is the height after convolution, $W_{out}$ is the width after convolution,
                     and $Stride$ is the step size. Rapid development of the CNN has enabled ordinary people
                     to imitate the style of a famous artist’s paintings to create works of art in that
                     style, which is called style migration. The traditional style migration network mainly
                     uses the VGG-19 model to extract texture features and the content of the image. The
                     grid defines the content LOSS function and the style LOSS function, and the final
                     LOSS function is weighted by those two LOSS functions. The final LOSS function is
                     obtained by weighting the above two LOSS functions. The LOSS function is minimized
                     through continuous iterative training to obtain the final image after style rendering.
                     A common style migration model is shown in Fig. 1.
                  
                  As seen in Fig. 1, common style migration algorithms require separate processing of blank noise each
                     time, and are too computationally complex. Therefore, this study improves on the traditional
                     style rendering technique by fusing a CNN algorithm and a VGG-19 network while using
                     a TensorFlow function to implement a convolution operation to build a fast style-rendering
                     model. The fast style-rendering model based on the improved CNN is shown in Fig. 2.
                  
                  
                        Fig. 1. Traditional style-migration model.
 
                  
                        Fig. 2. Fast style-rendering model with the improved CNN algorithm.
 
                  As can be seen in Fig. 2, the fast style-rendering model is divided into two main parts: the generative model
                     and the loss model. In the generative model, the original image is input, and after
                     a series of operations, the final output is a similarly styled image. The generative
                     model is essentially a convolutional neural network structure consisting of a convolutional
                     layer, a residual layer, and a deconvolution layer. The loss model is essentially
                     a pre-trained VGG-19 network structure that does not require weight updates during
                     training, but is only used to calculate the loss values for content and style, and
                     then to update the weights of the previous generation of the generative model through
                     back-propagation. The fast style-rendering model is trained by selecting a style image,
                     Ys, and a content image, Yc, during the training phase, then training the different
                     style and content images into different network models. In order to calculate the
                     difference between resulting image Y and the sample image, the LOSS model is used
                     to extract the information of these images in different convolutional layers and compare
                     them. Then, the weights are changed by back-propagation so that the resulting image
                     Y is close to Ys in terms of style, and close to Yc in terms of content. The weights
                     are then recorded to obtain a fast style-rendering model for that style. The LOSS
                     function for the fast style-rendering model is defined as follows:
                  
                  
                  In Eq. (2), $\phi $ is the trained VGG-19 model, $i$ is the number of convolutional layers,
                     $\phi _{i}\left(M\right)$ represents the activation value of the image at layer $M$$i$
                     of the $\phi $ model, $\hat{M}$ represents the generated image after the model update,
                     and $M$ is the starting input image. $C_{i}$ in $C_{i}H_{i}W_{i}$ represents the number
                     of channels of the feature image at layer $i$, $H_{i}$ represents the height of the
                     feature image at layer $i$, and $W_{i}$ represents the width of the feature image
                     at layer $i$. In addition, the Gram matrix is also used, and is given in (3):
                  
                  
                  
                     					In Eq. (3), $F$ represents matrix parameterization, and $G\begin{array}{l}
                     					\varphi \\
                     					i
                     					\end{array}\left(M\right)$ is the matrix of the activation values of the picture
                     at level $M$$i$ in the $\phi $ model, which is defined in (4):
                  
                  
                  
                     					In Eq. (4), $G\begin{array}{l}
                     					\varphi \\
                     					i
                     					\end{array}\left(M\right)_{c,c'}$ is the matrix correlation between the two channels
                     in image $c,c'$$M$, and $\phi _{i}\left(M\right)_{h,w,c}$ is the height and width
                     of the activation values of image $M$ at layer $i$ in the $\phi $ model and the channel
                     coordinate values. The total LOSS of the fast style-rendering model is defined in
                     Eq. (5):
                  
                  
                  In Eq. (5), the total loss value of the model is obtained by weighting the style and content
                     loss values. In order to avoid subjective evaluation with too strong a sense of personal
                     preference and emotion, leading to inaccurate evaluation results, the evaluation criteria
                     of the final rendered images are mainly based on objective evaluation methods, and
                     the data from indicators such as information entropy, mean squared error (MSE), peak
                     signal-to-noise ratio (PSNR), and average gradient are used to make a comprehensive
                     evaluation of the model.
                  
                  
                  Eq. (6) is an expression for information entropy, where $j$ is the grey value, $p_{j}$ represents
                     the proportion of pixels with a grey value of $j$ in the image, and $L$ is the total
                     grey value. The higher the IE, the higher the quality of the rendered image.
                  
                  
                  
                     					Eq. (7) is an expression for MSE. $M\times N$ indicates the size of the image, and a smaller
                     value indicates a higher-quality rendered image.
                  
                  
                  
                     					Eq. (8) is an expression for PSNR in which $k$ is a binary number and the default is 8. A
                     higher PSNR means less distortion and a better visual appearance of the image.
                  
                  
                  
                     					Eq. (9) is the expression for average gradient in  which $\frac{\partial f}{\partial X}$
                     and $\frac{\partial f}{\partial Y}$ represent the horizontal and vertical gradients,
                     respectively. The higher the G value, the clearer the image.
                  
                  
                  Eq. (10) is an expression for the correlation coefficient (R), where a higher R value indicates
                     higher correlation between the rendered image and the sample.
                  
                  
                  Eq. (11) is an expression for mutual information (MI) of the rendered image and sample image
                     $P_{\hat{M}M}\left(a,b\right)$. A larger value for MI indicates higher correlation
                     between the images.
                  
                  
                  Eq. (12) is an expression for spatial frequency; a higher SF indicates a more spatially active
                     image, i.e. a clearer image.
                  
                  
                  Eq. (13) is an expression for the horizontal direction frequency in the spatial frequency.
                  
                  
                  Eq. (14) is an expression for the frequency in the vertical direction of the spatial frequency.
                  
                  
                  Eq. (15) is an expression for structural similarity index $\mu _{A},\mu _{B}$, which is the
                     mean value of the rendered image and the sample image, respectively; ${\sigma ^{2}}_{A},{\sigma
                     ^{2}}_{B}$ denotes variance in the rendered image and the sample image, respectively,
                     while $\sigma _{AB}$ is the covariance, with $k_{1},k_{2}$ as constants. A higher
                     SSIM indicates a higher degree of similarity between the two images.
                  
                
               
                     3.2 Front- and Back-end Network Construction
                  Based on the fast style-rendering model constructed in the previous section, this
                     section combines Python algorithms and the Python Web framework to build the server-side
                     back end of the system, allowing users to access the style rendering system via the
                     web, upload their own images, and complete real-time rendering of the images in the
                     selected style [18]. The system's server front end uses the Bootstrap developmental framework to improve
                     adaptability to different browsers. The system server back end is divided into three
                     main parts. The first part is the Uniform Resource Locator (URL) module, which receives
                     URL requests from the front end and feeds them into the target function for execution.
                     The second part is the logic processing module, which mainly performs image processing,
                     including functions such as transcoding and biasing [19]. The third part is the fast style-rendering algorithm module, which completes the
                     style conversion of the image so that the input image can be rendered into the target
                     style according to instructions and can be presented to the user smoothly, as shown
                     in Fig. 3.
                  
                  Fig. 3 is a flow chart of the style rendering system. The URL module makes the whole system
                     more stable, and all instructions from the front end need to be filtered by the URL
                     module first. If you need to add new functions, you only need to write the function
                     and forward it from the URL, which can greatly reduce runtime and the threshold for
                     use of the algorithm. A flow chart of the entire rendering request is shown in Fig. 4.
                  
                  As seen in Fig. 4, since the format of the image content transmitted in the network is relatively special,
                     it is generally necessary to first encode and decode the image content sent by the
                     client. If it can be successfully decoded, the logic processing module will pass the
                     image to the fast style-rendering model to execute the rendering algorithm until completion,
                     and will then present the rendered image to the user. Multiple computations during
                     the course of operations are executed concurrently, and they potentially interact.
                     In addition, there are quite a few operating paths that the system can take, and results
                     may be uncertain. Therefore, after receiving the rendering request in the background,
                     the system hands over the request to the process scheduling function, which arranges
                     the rendering tasks according to the situation in the scheduling pool. The process
                     scheduling pool is shown in Fig. 5.
                  
                  We can see from Fig. 5 that the front-end page of the rendering system adopts the Bootstrap framework, which
                     is not only simple and efficient but also has outstanding performance. Bootstrap lowers
                     the threshold for user access from mobile devices by rewriting most of the HTML controls
                     [20]. The main functions of the front end are as follows. The user uploads photos first,
                     then selects the style they want rendered. The system performs rendering operations
                     with the selected photos and styles, and displays the finished image. If the image
                     is uploaded successfully, the back end will signal that the front end image has been
                     successfully received, and it issues an instruction to render the image. After the
                     front end receives the instruction, it will continuously ask the back end whether
                     the rendering operation is complete. Upon completion, it retrieves the rendered image
                     and displays it. The entire front-end page-rendering process is shown in Fig. 6.
                  
                  As seen in Fig. 6, the entire front-end operation is divided into four steps. First, pictures are uploaded
                     to request rendering; then, the process waits in the background. When results arrive,
                     progress is queried based on the instructions. Finally, reference pictures are returned
                     to the style picture library to complete the front-end rendering operation.
                  
                  
                        Fig. 3. Flow chart of the style rendering system model.
 
                  
                        Fig. 4. Flowchart of the style rendering request module.
 
                  
                        Fig. 5. Flow chart of the process pool operation.
 
                  
                        Fig. 6. Flow chart of front-end rendering operations.