Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. (College of Fine Arts and Art Design, Henan Vocational University of Science and Technology, Zhoukou 466000, China)



GAN, Image style, Art design, Generator, Discriminator

1. Introduction

As deep learning and computer vision develop in recent years, a technology called generative adversarial networks (GAN) has attracted extensive attention. GAN was originally a model generation method through competitive learning training. This method involves a generator and a discriminator. By constantly iterating the process of generating samples and evaluating the authenticity of samples, the generator can finally learn to generate high-quality samples, and then achieve the desired effect [1-3]. The extensive potential of GAN in computer vision technology makes it gradually become a hotspot in image style transfer. Especially in art design, the application of GAN has become more and more significant, providing researchers and artists with new creative tools and practice methods. Image style transfer is an important research direction in computer vision. It refers to keeping the content structure of one picture unchanged and integrating the artistic style of another picture into the former, so as to create images with novel visual effects [4-6]. The early image style transfer methods were mainly based on the characteristics of manual design. However, these methods had two main shortcomings: first, the types of styles that could be transferred were relatively limited and could not perfectly fit with the rich and diverse art forms. Second, due to the limitations of traditional methods, the processing process is often very time-consuming, and it is difficult to achieve real-time and efficient image style migration [7-9]. Therefore, in order to solve the problems of single transfer style, limited transfer efficiency, and significant gap between generated images and expected images caused by early image style transfer methods. Therefore, the study adopts a GAN based image style transfer method. GAN and its derivative models have injected new vitality into the field of image style transfer. The feature representation learned by GAN in the training process can automatically capture the style information in the image, thus avoiding the tedious process of manually designing features. In addition, the use of GAN can generate realistic images and realize the migration of various styles by adjusting the parameters of the generator, which provides a very creative way of practice for the field of art and design. The first part puts forward the research purpose, the second part designs an improved GAN image style migration algorithm, the third part tests the effect of the algorithm, and the fourth part draws the research conclusion.

2. Related Works

In recent years, image migration research has been gradually enriched. Hollandi R et al. proposed a deep learning method called nucleaizer, which aims to develop a truly universal way to locate 2D nuclei in a variety of experimental and optical microscope modes. The innovation of this study was that nucleaizer used image style conversion to automatically generate enhanced training samples to automatically adapt to the unlabeled new data and its core shape [10]. Lin CT proposed AugGAN, a data intensifier based on GAN, which can convert the road driving image into the target domain and effectively retain the image object. This study showed that the adaptability of vehicle detector in the field was not limited by the training data, and in the challenging case of day night conversion, AugGAN achieved significant performance improvement [11]. Chen h et al. proposed an internal external style conversion method, which combined two kinds of contrast loss to solve the problems of color inconsistency and pattern repetition in the existing art style conversion methods. This method can generate visually comfortable and satisfactory artistic images, and improve the stability and consistency of rendered video clips [12]. Wang X and others paid attention to the importance of font beautification and symbol effect. They developed a method to automatically render text with artistic effects, which significantly improved the efficiency and convenience of the production of text art [13]. Through industry government academia cooperation, Tu Y used open image data to enhance the tourism attraction of local governments. They used image style conversion to anonymously process images, and conducted demonstration experiments in local governments, which proved the feasibility and effectiveness [14].

Recently, GANs application in various fields has gradually deepened. GUI J team conducted a comprehensive study on the algorithm, theory and application of GANs, and introduced the motivation, mathematical expression and structure of most GAN algorithms in detail. In addition, they also discussed theoretical issues related to GANs, and studied typical applications in many other fields [15]. Karras T team observed that there was a problem that the details were incorrectly adhered to the image coordinates in the synthesis process of generating the countermeasure network, and proposed a solution: to solve the problem of aliasing in signal processing. Through continuous processing of all signals, they put forward a structure improvement scheme with universal applicability, which made the generated network significantly different from StyleGAN2 in internal representation and has a ``matched'' FID [16]. Alqahtani H team analyzed its contribution in the real world [17]. Zhou H team proposed an image fusion method based on generating countermeasure network. This method ensured that the fused image had clear structure and significant information through dual encoder single decoder and dual discriminator strategies, and was superior to other technologies in performance [18]. Liu M Y team focused on the algorithms and applications of GANs in image and video synthesis. They introduced some important technologies for stabilizing GAN training, and discussed their applications in the fields of image conversion, image processing, video synthesis and neural rendering, so as to improve the realism and application value of generating high-resolution images and videos [19].

Kwon G et al. proposed an image style transfer method that does not require reference style images. This method utilizes a pre trained text image embedding model to achieve image style transfer with a single text condition. The researchers also introduced text image matching loss and multi view enhancement modules to enhance the authenticity of texture transfer. The results showed that the method achieved good results in image style transfer [20]. Deng Y et al. used the Transformer method to address the global information limitations of traditional image style transfer methods. This method has two different Transformer encoders, each used to generate specific information sequences for content and style, and adopts a multi-layer structure to stylize the content information sequences in expert meetings. The results show that this method has stronger style transfer ability compared to traditional methods [21]. Liu S et al. proposed an image style transfer method based on an adaptive attention normalization model, which adaptively normalizes deep content features according to image style, thereby matching its global statistical information and matching each piece of information into deep content. In order to solve the problem of local distortion, a spatial attention score method was adopted to adaptively normalize attention at each information point. The results show that this method has excellent performance in style transfer between images and videos [22].

It can be seen that the research in the field of image migration is gradually enriched, and scholars have proposed a variety of deep learning methods, such as using image style conversion to adapt to new data. AugGAN is used for road driving image conversion to improve the adaptability of vehicle detector. The internal external style conversion method solves the art image problem, and the new method of font beautification improves the efficiency of text art. Image style conversion is used to enhance the tourism attraction of local governments. GANs are more and more widely used, including algorithm, mathematical expression, structure research and multi domain applications. The innovation of this research lies in the application of GAN based image style in the field of art and design, which will focus on the realism and application value of artistic image generation. This research is expected to bring new possibilities and innovative ideas to the field of art and design.

3. Methods

A novel style transfer algorithm, AMS Cycle GAN, has been studied and designed. This algorithm introduces position normalization and moment shortcut modules to preserve the feature information of the input image, thereby achieving better image style transfer effects.

3.1. Generator Network Structure Design

The purpose of researching and designing AMS Cycle GAN is to address the limitations of existing image style transfer algorithms and meet their various potential needs in practical applications. Traditional style transfer models lack consistency in both content and style when generating images, and their generation efficiency is low. To address these issues, AMS Cycle GAN was designed and studied. In addition, when designing AMS Cycle GAN, the potential needs of image style transfer in different fields such as digital art creation, advertising design, film and animation production were considered to improve the visual quality of images and ensure consistency in image style while efficiently generating images.

The network structure of AMS-cycle-GAN image style migration algorithm generator is designed to provide improved image style migration effect. In order to achieve this goal, the network optimizes the model through three aspects. Firstly, PONO-MS is alternately used in the encoder decoder part, which can not only effectively retain the characteristic information of the input image, but also improve the optimization and convergence of the network. Secondly, the loss function is the same as the Cycle-DPN-GAN model, including the generation confrontation loss, cycle consistency loss, etc. Finally, a channel based attention mechanism is introduced into the discriminator to better focus on important content in the process of migration. The whole network structure has two generators and two discriminators, which realizes bidirectional data generation. The PONO-MS module in the generator is a variant of jump connection. It processes the average and standard deviation extracted by the encoder and injects them directly into the corresponding decoder layer, so that the characteristic map generated by the decoder has similar statistics with the corresponding layer of the encoder. Fig. 1 shows the overall structure.

Fig. 1. Overall network structure.

../../Resources/ieie/IEIESPC.2025.14.4.443/fig1.png

Fig. 2. Generator network structure.

../../Resources/ieie/IEIESPC.2025.14.4.443/fig2.png

Fig. 1, the $x$ and $y$ regions represent the input trends of the image, where $G$ and F both represent generators. The image is transmitted from the $x$ region to the $y$ region through generator $G$, and then from the $y$ region to the $x$ region through generator F. A PONO-DMS module is introduced between the encoders of the two generators to effectively save the style information of the image. The generator network structure includes encoder, converter, and decoder. The encoder captures the feature information from the input image by position normalization and directly injects it into the moment shortcut network layer of the decoder to effectively improve network optimization and convergence. The converter is responsible for keeping the size of the feature image unchanged, effectively transferring the features of the art image, and using reflection to fill the boundary of the input tensor. The generator network structure is shown in Fig. 2.

The moment shortcut network layer of the decoder receives the data information normalized by PONO, converts the size of the feature image from 64*64*256 to 256*256*64 through two layers of inverse transposition convolution, and maps the content and style information of the image to pixels. Finally, the convolution kernel size is $7\times7$ network layer outputs 256*256 three channel generated images. The role of PONO-MS module of position normalization and moment shortcut is to capture the feature mean and standard deviation extracted from the input image, and directly inject them into the moment shortcut network layer of the decoder, so as to improve the convergence of network training. Fig. 3 shows PONO-MS module schematic diagram.

Fig. 3. Schematic diagram of PONO-MS module.

../../Resources/ieie/IEIESPC.2025.14.4.443/fig3.png

All calculation formulas are derived based on the GAN algorithm and the network structure discriminator of the Patch GAN model. The calculation method of MS layer is shown in Eq. (1).

(1)
$ MS(x)=\gamma F(x)+\beta . $

In Eq. (1), $F(x)$ represents the middle layer modeling, $\beta $ and $\gamma $ represents the mean and standard deviation parameters. The extracted mean and standard deviation are taken as new parameters respectively, as shown in Eq. (2).

(2)
$ \left\{\begin{aligned} & \mu =\beta,\\ & \sigma =\gamma. \end{aligned}\right. $

3.2. Discriminator Network Structure Design

This discriminator network framework refers to the basic model of patch GAN, with a size of $70\times70$, and the attention mechanism is introduced in the fourth and fifth convolution layers. This mechanism helps to focus the key pixel areas in the image while ignoring or filtering the irrelevant parts. The application of attention mechanism can further improve the fine-tuning ability of generating confrontation model. The discrimination network structure is shown in Fig. 4.

Fig. 4. Distinct network structure.

../../Resources/ieie/IEIESPC.2025.14.4.443/fig4.png

The attention mechanism is realized through the convolutional block attention module (CBAM), which is a method to improve the performance of convolutional network, and mainly deals with the relationship between characteristic channels. When the image down sampling feature size is $1\times31\times31\times512$, the intermediate feature map obtains spatial information through global average pooling (GAP) and global maximum pooling (GMP) operations, generates two different spatial feature maps, and transmits them to the shared network. The characteristic diagram is shown in Eq. (3).

(3)
$ F\in R^{C\times H\times W} . $

In formula (3), $R$ represents vector elements, $C$ represents the number of channels, $H$ represents the height of the feature map, and $W$ represents the width of the feature map.

The shared network is composed of multi-layer perceptrons with hidden layers, and the importance of each feature map channel direction is calculated. After obtaining the weight of each channel, they will excite J to the corresponding channel of the intermediate characteristic graph. This process is shown in Eq. (4).

(4)
$ M(f)=concat(F*MLP(GlobalAvgPool(F)),\nonumber \\ F*MLP(GlobalMaxPool(F))). $

Then Eq. (5) can be obtained.

(5)
$ M(f)\!=\!concat(F\!*\!W_{1}\! (W_{0} (F_{gap} )),~F\!*\!W_{1}\! (W_{0} (F_{gmp} ))). $

In Eqs. (4) and (5), $F_{gap} $ represents the global average pooling characteristic graph, $F_{gmp} $ represents the global maximum pooling characteristic graph. $W_{0} $ and $W_{1} $ represents the shared weight. $GlobalAvgPool(F)$ represents global average pooling, $GlobalMaxPool(F)$ represents global maximum pooling, $MLP$ represents multi-layer perceptron, and $concat$ represents concatenation of feature maps both meeting the conditions as shown in Eq. (6).

(6)
$ \left\{\begin{aligned} & W_{0} \in R^{\frac{C}{r} \times C} , \\ & W_{1} \in R^{\frac{C}{r} \times C} . \end{aligned}\right. $

In formula (6), $\frac{C}{r} $ represents the scaling factor, and $r$ represents the scaling ratio.

After this operation is completed, a GAP characteristic graph and a GMP characteristic graph are generated, both of which have weights. Then, the two feature maps are summarized. After concat operation, the number of feature map channels is doubled. Finally, a network layer with a $1\times1$ convolutional kernel is used to reduce the number of channels to 512. Then, a leaky rectified linear unit (Leaky ReLU) activation function is added. This completes the construction of the channel-based attention mechanism module. In the discriminator of the AMS-Cycle GAN model, a network layer with a $4\times4$ convolutional kernel is selected for output generation. Finally, a single-channel prediction map of size $1\times30\times30\times1$ is generated. The internal structure of the discriminator is shown below.

Fig. 5. Discriminator internal structure information.

../../Resources/ieie/IEIESPC.2025.14.4.443/fig5.png

To make the model training more stable and meet the 1-lipschitz condition, spectral normalization (SN) is introduced into multiple convolution layers of the discriminator during the training process. AMS-Cycle-GAN model is shown in Eq. (7).

(7)
$ L_{Generator} =L_{lsgan\_ Generato} +\lambda _{1} L_{identity} (G,F)\nonumber \\ \quad +\lambda _{2} L_{cycle} (G,F,X,Y)\nonumber\\ \quad +\lambda _{3} L_{MS-SSIM} (G,F). $

In formula (7), $L_{Generator} $ represents the total loss function of the generator, $L_{lsgan\_ Generato} $ represents the loss function of the generated adversarial network, $\lambda _{1} L_{identity} (G,F)$ represents the weight of identity loss, $\lambda _{2} L_{cycle} (G,F,X,Y)$ represents the weight of cyclic consistency loss, $\lambda _{3} L_{MS-SSIM} (G,F)$ represents the weight of MS-SSIM loss, and (G) and (F) represent the generator. Then there is Eq. (8).

(8)
$ L_{discriminators} ={\mathop{\min }\limits_{D_{Y} }} L_{lsgan} (G,D_{y} ,X,Y)\nonumber\\ \quad +\min L_{lsgan} (F,D_{x} ,Y,X). $

In formula (8), $L_{discriminators} $ represents the loss function of the discriminator, and ${\mathop{\min }\limits_{D_{Y} }} L_{lsgan} (G,D_{y} ,X,Y)+\min L_{lsgan} (F,D_{x} ,Y,X)$ represents the game process between the generator and the discriminator. On this basis, there can be Eq. (9).

(9)
$ L(G,F,D)=\arg{\mathop{\min }\limits_{G,F,D_{Y} ,D_{X} }} (L_{Generator} ,L_{discriminators} ). $

In Eq. (9), $(G,F)$ represents generator, $(D_{Y} ,D_{X} )$ represents discriminator, $L{}_{lsgan} $ represents generation confrontation process, $L_{Generator} $ represents generator model loss function and $L_{discriminators} $ represents discriminator model loss function. $L(G,F,D)$ represents a loss function that represents the minimization of all objectives. $MS\text{-}SSIM(x,y)$ is shown in Eq. (10).

(10)
$ MS\text{-}SSIM(x,y)\nonumber\\ =[l_{m} (x,y)]^{\alpha M} \cdot \prod [c_{j} (x,y)]^{\beta j} [s_{j} (x,y)]^{\gamma i}. $

In Eq. (10), $M$ is set as 5, $MS\text{-}SSIM(x,y)$ represents the multi-scale structural similarity index, $l_{m} (x,y)$ represents the brightness comparison function, $(x,y)$ represents the contrast comparison function, $s_{j} (x,y)$ represents the structural comparison function, $\alpha $, $\beta $, $\gamma $ represents the weights of each comparison function, and other parameters are set as Eq. (11).

(11)
$ \left\{\begin{aligned} & \beta _{1} =\gamma _{1} =0.0448, \\ & \beta _{2} =\gamma _{2} =0.2856, \\ & \beta _{3} =\gamma _{3} =0.3001, \\ & \beta _{4} =\gamma _{4} =0.2363. \end{aligned}\right. $

$\alpha $ parameter setting is shown in Eq. (12).

(12)
$ \alpha _{5} =\beta _{5} =\gamma _{5} =0.1333 . $

Thus, the loss function $MS-SSIM$ can be obtained as Eq. (13).

(13)
$ L_{MS-SSIM} (G,F)=[1-MS-SSIM(x,F(G(x)))]\nonumber\\ \quad +[1-MS-SSIM(y,G(F(y)))]. $

$L_{identity} (G,F)$ in Eq. (7) is shown in Eq. (14).

(14)
$ L_{identity} (G,F)=E_{y\sim P_{data(y)} } [\| G(y)-y\| _{1} ]\nonumber\\ \quad +E_{x\sim P_{data(x)} } [\| F(x)-x\| _{1} ]. $

In formula (14), $E_{x\sim P_{data(x)} } [\| F(x)-x\| _{1} ]$ represents the loss of transitioning from domain $X$ to domain $Y$ and then returning $X$, and $E_{y\sim P_{data(y)} } [\| G(y)-y\| _{1} ]$ represents the loss of transitioning from domain $Y$ to domain $X$ and then returning $Y$. $L_{cycle} (G,F,X,Y)$is show in Eq. (15).

(15)
$ L_{cycle} (G,F,X,Y)=E_{x\sim P_{data(x)} } [\| F(G(x))-x\| _{1} ]\nonumber \\ \quad +E_{y\sim P_{data(y)} } [\| G(F(y))-y\| _{1} ]. $

AMS-Cycle-GAN model first randomly extracts an image from the natural image domain and inputs it into the generator. At the same time, position normalization is applied to extract the mean and standard deviation from the image $x$, and these information are transmitted to the MS network layer. Similarly, an image is randomly extracted from the art image domain and input into the generator to obtain the output. This process has also been processed by PONO-MS module. Then the generated image and the original art image are input into the discriminator. After passing through the attention mechanism module, the importance weights of different channels are obtained. Then these weights are applied to the corresponding channels of the intermediate feature graph. By minimizing the lsgan loss of the discriminator, the discriminator is optimized and the parameters are updated. Similarly, the generated image and the original natural image are input into the discriminator to minimize the lsgan loss of the discriminator, optimize the discriminator, and update the parameters. Next, the reconstructed image is obtained and the cycle loss is calculated. Then it is necessary to traverse and update the iterations in the training process. Finally, after completing the training process, the model establishes an image conversion model between the natural image domain and the artistic image domain, which can generate images with artistic style. During the training process, if the number of iterations exceeds a preset threshold, the learning rate will gradually decline.

Import tensorflow as tf

From tensorflow.keras import layers

Sef PONO_MS(x):

Mean, var $=$ tf.nn.moments(x, axes$=[1$, $2]$, keepdims$=$True)

Return (x $-$ mean) $/$ tf.sqrt(var $+$ 1e$-8$)

Def build_generator(input_shape):

Inputs $=$ tf.keras.Input(shape=input_shape)

X $=$ layers.Conv2D(64,(7,7),padding$=$`same')(inputs)

X $=$ layers.ReLU()(PONO_MS(x))

X $=$ layers.Conv2D(256,(3,3), padding$=$`same')(x)

Outputs $=$ layers.Conv2D(3,(7,7),padding$=$`same', Activation$=$`tanh')(layers.ReLU()(PONO_MS(x)))

Return tf.keras.Model(inputs, outputs)

Def build_discriminator(input_shape):

Inputs $=$ tf.keras.Input(shape$=$input_shape)

X $=$ layers.LeakyReLU(alpha$=$0.2)(layers.Conv2D(64,(4,4), strides$=$2, padding$=$`same')(inputs))

X $=$ layers.LeakyReLU(alpha$=$0.2)(layers.Conv2D(128,(4,4), strides$=$2, padding$=$`same')(x))

Outputs $=$ layers.Conv2D(1, (4, 4), padding$=$`same', activation$=$`sigmoid')(x)

Return tf.keras.Model(inputs, outputs)

4. Results and Discussion

In this experiment, the performance of GAN image style migration algorithm was improved and tested, and different settings and parameters were selected for model training and quantitative analysis. At the same time, the quantitative comparison of different algorithms was also carried out, including MUNIT, UNIT, Cycle GAN, UGATIT, Cycle-DPN-GAN, and AMS-Cycle GAN.

4.1. Model quantitative analysis

First of all, the parameters used in the training phase were sorted out in detail. The beta value was set to $(0.5$, $0.999)$, which allowed the optimization strategy to set the trade-off between the first-order moment estimation (momentum) and the second-order moment estimation (RMSprop) at a suitable proportion, so as to converge faster in the training process. The batach size provided is 1. The research expects to focus on a single sample to update the parameters each time. The total number of epoch is set to 200, and the research expects that the model can fully learn and reduce the error to a reasonable range within this period of time. Parameter settings are shown in Table 1. The dataset settings are shown in Table 2.

The experimental environment of this case was deployed on Ubuntu 18.04.1 system, equipped with Intel (R) Xeon (R) platinum 8255C CPU and 47GB memory, which further improved the computing performance. In particular, the NVIDIA Ge Force RTX 3090 graphics card with 24G video memory was also used in this experiment, which had ideal execution efficiency and stability in large-scale data dependent deep learning tasks. The change of loss function is shown in Fig. 6.

As shown in Fig. 6, the line of the loss function in Fig. 6(a) rapidly decreases in the first 10 iterations and fluctuates frequently between 10 and 80 cycles, stabilizing after 80 iterations. The loss function line in Fig. 6(b) also rapidly decreases in the first 10 iterations, fluctuates frequently between 10 and 60 cycles, and tends to stabilize after 60 iterations. Figs. 6(c) and 6(d) show the situation after 100 iterations, indicating that it is still in a stationary state. Overall, during the training process of GAN, the total loss fluctuates greatly in the first 60 iterations. However, as the training cycle increases, the fluctuations gradually stabilize. This indicates that the proposed network model can be effectively trained. This indicates that the training effect is relatively good. With the training, the discriminator of the Cycle-DPN-GAN network was gradually difficult to distinguish the authenticity of the generated image.

Fig. 6. Discriminator internal structure information.

../../Resources/ieie/IEIESPC.2025.14.4.443/fig6.png

Table 1. Parameter settings.

Experimental stage

Type

Parameter value

Training phase

Beta

(0.5, 0.999)

Batch size

1

Epoch

200

Learning rate (top 100 epochs)

0.0002

Learning rate (last 100 epochs)

Linear decay from 0.0002 to 0

λ 1, λ 2, λ three

5.0, 10.0, 1.0

Hardware

Hardware type

Hardware model

System

Ubuntu 18.04.1

CPU

Intel (R) Xeon (R) platinum 8255C

Memory

47GB

GPU

24G NVIDIA Ge Force RTX 3090

Table 2. Dataset settings.

Dataset Name

Dataset Description

Sample quantity

Image size

Dataset source

photo2vangogh

Contains modern photos and Van Gogh style images for the transfer of photos to Van Gogh style

1000

256x256

Natural Image Collection

photo2monet

Contains modern photos and Monet style images for the transfer of photos to Monet style

800

256x256

Natural Image Collection

vangogh2photo

Images containing Van Gogh style and modern photographs for the transfer of Van Gogh style to photographs

1200

256x256

Art database

monet2photo

Images containing Monet style and modern photos for the transfer of Monet style to photos

750

256x256

Art database

Summer2Winter

Images containing summer and winter scenery for seasonal style conversion

500

256x256

Landscape Image Collection

Orange2Apple

Image containing oranges and apples for style switching between fruit categories

600

256x256

Food image database

Photo2Flower

Images containing various flowers and regular photos, used for transferring photos to floral styles

900

256x256

Flower Image Database

4.2. Ablation test and performance analysis

In this ablation experiment, AMS-Cycle-GAN model was used for five groups of experiments. To evaluate the performance of the model, three kinds of losses were used: MS-SSIM loss, cyclic consistency loss, and identity mapping loss. The AMS-Cycle-GAN model was analyzed using the experimental data. The ablation test results are shown in Fig. 7.

Fig. 7. Discriminator internal structure information.

../../Resources/ieie/IEIESPC.2025.14.4.443/fig7.png

As shown in Fig. 7, in this round of experiments, the MS-SSIM loss was 7.2, the cycle consistency loss was 3.1, and the identity mapping loss was 0.6. Compared with other experimental results, the benchmark experiment had lower values in MS-SSIM loss and cycle consistency loss, indicating better performance. In these five groups of experiments, making MS-SSIM loss, cycle consistency loss, and identity mapping loss reached a low and balanced weight combination was of great significance to improve the model performance, and AMS-Cycle-GAN had better performance. Removing the attention mechanism and changing the loss weight of identity mapping may reduce the performance of the model, which was enough to prove the effectiveness and robustness of AMS-Cycle-GAN. The image conversion effects on photo2vangogh and vangogh2photo datasets are shown in Table 3.

Table 3. Image conversion effects on the Hoto2VanGogh and VanGogh2Photo datasets.

Model

Photo2VanGogh (IS)

Photo2VanGogh (FID)

VanGogh2Photo (IS)

VanGogh2Photo (FID)

MS-SSIM=7.0 Cyclic Consistency Loss=3.0 Identity Mapping Loss=0.5

4.66±0.28

144.39

3.78±0.35

175.32

MS-SSIM=7.0 Cyclic Consistency Loss=3.0 Identity Mapping Loss=5.0

4.16±0.40

168.52

4.08±0.36

180.87

AMS-CycleGAN (without MS-SSIM loss)

4.38±0.32

159.47

4.28±0.31

171.24

AMS-CycleGAN (without attention mechanism module)

4.71±0.53

136.2

4.38±0.32

162.31

AMS-CycleGAN

5.28±0.74

112.75

4.87±0.47

148.29

As shown in Table 3, if the attention mechanism module of AMS-Cycle-GAN was removed, the performance of IS and FID was better than that of the previous model, and the IS score of AMS-Cycle-GAN on the vangogh2photo dataset reached the highest $4.38 \pm 0.32$. It further explained the importance of attention mechanism in GAN model. Finally, AMS-Cycle-GAN model performed best in all settings. On the photo2vangogh dataset, the scores of IS and FID were $5.33 \pm 0.76$ and 113.28, respectively. The scores of vangogh2photo were $4.95 \pm 0.49$ and 149.68, significantly better than other models. The generalization test is shown in Fig. 8.

Fig. 8. Discriminator internal structure information.

../../Resources/ieie/IEIESPC.2025.14.4.443/fig8.png

As shown in Fig. 8, the quantitative evaluation results of summer2winter, orange2apple and photo2flower datasets were analyzed. The evaluation indexes of these three data sets were mainly IS and FID scores. The IS score of summer2winter dataset was low, indicating that the generated image scene was relatively simple, but the FID score was low, indicating that the visual quality was good. The orange2apple dataset had a high is score, which indicated that the image diversity was high, but the FID score was high, and the visual quality needed to be improved. The IS and FID scores of photo2flower dataset were relatively low, indicating that it was relatively balanced in diversity and visual quality.

Although the algorithm designed for research has more performance and efficiency advantages than traditional methods, this method increases complexity, requires richer training resources, and also requires more tuning steps, which is an inevitable limitation of high-performance models.

Comparative analysis is shown in Table 4.

Table 4. Comparative analysis of photo2vangogh and photo2monet datasets.

Model

photo2vangogh IS

photo2vangogh FID

photo2monet IS

photo2monet FID

MUNIT (cyclic consistent loss=10.1)

5.0±0.7

98.2

4.7±0.5

110.1

UNIT (cyclic consistent loss=10.2)

5.1±0.6

102.8

5.0±0.5

155.1

Cycle GAN (identity mapping loss=0.6)

4.2±0.3

152.2

5.1±0.7

93.2

UGATIT

4.4±0.4

188.1

3.7±0.3

128.6

Cycle-DPN-GAN

4.6±0.4

155.8

5.3±0.7

88.3

AMS-Cycle GAN

5.3±0.8

114.1

6.0±0.9

75.5

Table 5. Comparative analysis of vangogh2photo and monet2photo datasets.

Model

vangogh2photo_IS

vangogh2photo_FID

monet2photo_IS

monet2photo_FID

MUNIT (cyclic consistent loss=10.5)

2.35±0.18

230.21

2.04±0.20

186.6

UNIT (cyclic consistent loss=10.2)

2.80±0.32

201.54

3.10±0.25

213.15

CycleGAN (identity mapping loss=0.45)

4.42±0.25

167.8

3.38±0.34

119.22

UGATIT

3.92±0.14

183.2

2.95±0.28

148.88

Cycle-DPN-GAN

4.68±0.50

157.36

3.18±0.29

116.47

AMS-CycleGAN

4.94±0.45

148.23

3.51±0.32

113.8

As shown in Table 4, AMS-cycle-GAN was used for comparison and Analysis on photo2vangogh and photo2monet datasets. On the photo2vangogh dataset, the AMS-Cycle-GAN's IS value was $5.3 \pm 0.8$, and the FID value was 114.1. Its IS value as relatively high, which meant that the image quality was relatively good. At the same time, the FID value of AMS-Cycle-GAN was relatively low, which indicated that the generated image was closer to the target image. Compared with other methods, AMS-Cycle-GAN had certain advantages in the performance of photo2vangogh dataset. On the photo2monet dataset, the IS value of AMS-Cycle-GAN was $6.0 \pm 0.9$, and the FID value was 75.5. These two indicators showed that AMS-cycle-GAN performed better than other methods in photo2monet. AMS-Cycle-GAN performed better on both photo2vangogh and photo2monet datasets. The IS value reflected the quality of the generated image, while the FID value indicated the similarity between the generated image and the actual image. AMS-Cycle-GAN had obvious advantages in IS value, and its FID value was relatively low. These data clearly showed that AMS-Cycle-GAN performed well on these two datasets, and the generated images were of high quality and close to the actual images. AMS-Cycle-GAN was quantitatively compared with other methods in vangogh2photo and monet2photo datasets. The comparative analysis of vangogh2photo and monet2photo datasets is shown in Table 5.

As shown in Table 5, on the vangogh2photo dataset, AMS-Cycle-GAN's IS score was $4.94 \pm 0.45$, which was much higher than MUNIT and UNIT's $2.35 \pm 0.18$ and $2.80 \pm 0.32$, and slightly higher than UGATIT's $3.92 \pm 0.14$. This showed that AMS-Cycle-GAN had high image quality and consistency when converting Van Gogh style into photo style. Similarly, in terms of FID score, AMS-Cycle-GAN scored 148.23, which was the lowest among all models. It meant that AMS-Cycle-GAN can better maintain the structural characteristics of the original image than other algorithms when moving from Van Gogh style to photo style. AMS-Cycle-GAN also performed well on monet2photo dataset. Its IS score was $3.51 \pm 0.32$, only lower than that of CycleGAN (IS $=3.38 \pm 0.34$) and Cycle-DPN-GAN (IS $=3.18 \pm 0.29$). In terms of FID similarity score, AMS-Cycle-GAN won with a score of 113.80, which meant that AMS-Cycle-GAN can retain the characteristics of the original image better than other models when transferring Monet's painting style to the photo style. Thus, AMS-Cycle-GAN had significant advantages in image style transfer.

As shown in Table 6, in terms of algorithm complexity, the number of parameters for AMS Cycle GAN is 12M, which is the lowest, followed by UNIT, UGatiT, and finally MUNIT. In terms of training time, AMS-Cycle GAN only takes 10 hours, which is the fastest among all algorithms. It can be seen that the complexity of the AMS Cycle GAN designed for research is the lowest. From the implementation time of shang.l, the implementation time of AMS Cycle GAN is 3600s, which is the shortest among all algorithms, and the longest is UNIT, which requires 4700 seconds. From this, it can be seen that the AMS Cycle GAN algorithm designed in this study has superiority in terms of comprehensive algorithm complexity and implementation time.

Table 7 shows the comparison of algorithm performance. It can be seen that the AMS Cycle GAN algorithm designed in the study has the best performance indicators, with the highest PSNR value of 27.5 and the highest SSIM value of 0.90,. Its PL value is the smallest, but this indicator is a reverse indicator, and the smaller the value, the better the effect. The LPIPS value of the AMS Cycle GAN algorithm is also the smallest, and the smaller the value, the better the effect. Overall, compared with advanced algorithms, the AMS Cycle GAN algorithm designed in research also has advantages.

Table 6. Comparison of algorithm implementation time and complexity.

Algorithm

Number of parameters

Training time (hours)

Implementation time (seconds)

AMS-Cycle GAN

12M

10

3600

MUNIT

15M

12

4500

UNIT

13M

13

4700

UGATIT

14M

11

4000

Table 7. Comparison of algorithm effects.

Algorithm

Peak Signal-to-Noise Ratio (PSNR)

Structural Similarity Index (SSIM)

Perceptual Loss (PL)

Learned Perceptual Image Patch Similarity (LPIPS)

AMS-Cycle GAN

27.50

0.90

0.09

0.07

StyleGAN2

26.30

0.88

0.12

0.1

Improved-GAN

27.00

0.89

0.1

0.08

BigGAN

25.67

0.85

0.15

0.12

VQ-VAE-2

27.30

0.89

0.11

0.09

5. Conclusion

In recent years, the role of image style transfer in modern creation has gradually increased, and this method can change image style while retaining image content. The quality and efficiency of image style transfer in traditional methods are insufficient in language, and there are also certain shortcomings in style coherence. To address this issue, an AMS-cycle-GAN image migration algorithm is studied and designed. Through the improved design of generator and the novel application of discriminator, it aims to provide better image style transfer effect. Different datasets including photo2vangogh and vangogh2photo are used for testing. The results showed that AMS-Cycle-GAN showed superior performance in all aspects. Its IS value and FID value on photo2vangogh data set reached $5.3 \pm 0.8$ and 114.1 respectively, which was significantly better than other methods. At the same time, on the vangogh2photo dataset, AMS-Cycle-GAN's IS score was $4.94 \pm 0.45$ and FID was 148.23, which still maintained a significant advantage. In addition, the experimental effect after removing the key modules was still stable. For example, the IS and FID scores on photo2vangogh dataset and vangogh2photo dataset reached $5.28 \pm 0.74$, 112.75 and $4.87 \pm 0.47$, 148.29. Further experiments showed that AMS-Cycle-GAN showed high-quality image generation effect and low actual image distance on several other datasets, such as summer2winter, orange2apple, and photo2flower. The model of research and design brings new perspectives and practical methods to image style transfer. However, AMS-Cycle-GAN-generated images may not meet everyone's personalized needs, and the design with stronger personalized customization ability is the future research direction.

Research and design methods can play an important role in areas such as digital art creation, advertising design, and film and television animation production. In digital art creation, this method can convert physical photos into images of specific artistic styles, thereby quickly generating digital art flat. In advertising design, this method can convert product photos into artistic style images, thereby attracting consumer attention. In film and television animation production, this method can transform the film or scene into a specific visual style, enhance the stylized effect of the film, and enhance the viewing experience of the audience.

Although research and design methods are more efficient and have better stylization effects, they still have their limitations. They rely on complex computing systems as a whole, making it difficult to achieve lightweight computing suitable for mobile devices. Therefore, conducting more lightweight modular design is the future research direction

REFERENCES

1 
Y. Fu, X. J. Wu, and T. Durrani, ``Image fusion based on generative adversarial network consistent with perception,'' Information Fusion, vol. 72, pp. 110-125, 2021.DOI
2 
X. Lu, W. Liao, Y. Zhang, and Y. Huang, ``Intelligent structural design of shear wall residence using physics-enhanced generative adversarial networks,'' Earthquake Engineering & Structural Dynamics, vol. 51, no. 7, pp. 1657-1676, 2022.DOI
3 
H. Ding, L. Chen, L. Dong, Z. Fu, and X. Cui, ``Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection,'' Future Generation Computer Systems, vol. 131, pp. 240-254, 2022.DOI
4 
B. Espejo-Garcia, N. Mylonas, L. Athanasakos, E. Vali, and S. Fountas, ``Combining generative adversarial networks and agricultural transfer learning for weeds identification,'' Biosystems Engineering, vol. 204, pp. 79-89, 2021.DOI
5 
J. Tan, X. Liao, J. Liu, Y. Cao, and H. Jiang, ``Channel attention image steganography with generative adversarial networks,'' IEEE Transactions on Network Science and Engineering, vol. 9, no. 2, pp. 888-903, 2021.DOI
6 
S. Hu, B. Lei, S. Wang, Y. Wang, Z. Feng, and Y. Shen, ``Bidirectional mapping generative adversarial networks for brain MR to PET synthesis,'' IEEE Transactions on Medical Imaging, vol. 41, no. 1, pp. 145-157, 2021.DOI
7 
K. Liu, Z. Ye, H. Guo, D. Cao, L. Chen, and F. Y. Wang, ``FISS GAN: A generative adversarial network for foggy image semantic segmentation,'' IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 8, pp. 1428-1439, 2021.DOI
8 
M. U. Danjuma, B> Yusuf, and I. Yusuf, ``Reliability, availability, maintainability, and dependability analysis of cold standby series-parallel system,'' Journal of Computational and Cognitive Engineering, vol. 1, no. 4, pp. 193-200, 2022.DOI
9 
Y. Fang, B. Luo, T. Zhao, B. Jiang, and Q. Liu, ``ST-SIGMA: Spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting,'' CAAI Transactions on Intelligence Technology, vol. 7, no. 4, pp. 744-757, 2022.DOI
10 
R. Hollandi, A. Szkalisity, T. Toth, A. E. Carpenter, K. Smith, and P. Horvath, ``nucleAIzer: A parameter-free deep learning framework for nucleus segmentation using image style transfer,'' Cell Systems, vol. 10, no. 5, pp. 453-458, 2020.DOI
11 
C. T. Lin, S. W. Huang, Y. Y. Wu, and S. H. Lai, ``GAN-based day-to-night image style transfer for nighttime vehicle detection,'' IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 951-963, 2022.DOI
12 
H. Chen, Z. Wang, H. Zhang, Z. Zuo, A. Li, W. Xing, and D. Lu, ``Artistic style transfer with internal-external learning and contrastive learning,'' Advances in Neural Information Processing Systems, vol. 34, pp. 26561-26573, 2021.URL
13 
X. Wang, S. Yang, W. Wang, and J. Liu, ``Artistic text style transfer: An overview of state-of-the-art methods and datasets [SP Forum],'' IEEE Signal Processing Magazine, vol. 39, no. 6, pp. 10-17, 2022.DOI
14 
Y. Tu, M. Urata, M. Endo, and T. Yasuda, ``Image style transfer and image release for tourism promotion in local governments,'' Journal of Global Tourism Research, vol. 7, no. 2, pp. 137-144, 2022.DOI
15 
J. Gui, Z. Sun, Y. Wen, D. Tao, and J. Ye, ``A review on generative adversarial networks: Algorithms, theory, and applications,'' IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 4, pp. 3313-3332, 2021.DOI
16 
T. Karras, M. Aittala, and S. Laine, ``Alias-free generative adversarial networks,'' Advances in Neural Information Processing Systems, vol. 34, pp. 852-863, 2021.URL
17 
H. Alqahtani, M. Kavakli-Thorne, and G. Kumar, ``Applications of generative adversarial networks (GANs): An updated review,'' Archives of Computational Methods in Engineering, vol. 28, pp. 525-552, 2021.DOI
18 
H. Zhou, J. Hou, Y. Zhang, J. Ma, and H. Ling, ``Unified gradient-and intensity-discriminator generative adversarial network for image fusion,'' Information Fusion, vol. 88, pp. 184-201, 2022.DOI
19 
M.-Y. Liu , X. Huang, J. Yu, T>-C. Wang, and A. Mallya, ``Generative adversarial networks for image and video synthesis: Algorithms and applications,'' Proceedings of the IEEE, vol. 109, no. 5, pp. 839-862, 2021.DOI
20 
G. Kwon and J. C. Ye, ``Clipstyler: Image style transfer with a single text condition,'' Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18062-18071, 2022.DOI
21 
Y. Deng , F. Tang, W. Dong, C. Ma, X. Pan, L. Wang, and C. Xu, ``Stytr2: Image style transfer with transformers,'' Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11326-11336, 2022.DOI
22 
S. Liu, T. Lin, D. He, F. Li, M. Wang, Z. Sun, Q. Li, and E. Ding, ``Adaattn: Revisit attention mechanism in arbitrary neural style transfer,'' Proc. of the IEEE/CVF International Conference on Computer Vision, pp. 6649-6658, 2021.DOI

Author

Kai Zhao
../../Resources/ieie/IEIESPC.2025.14.4.443/au1.png

Kai Zhao who received his bachelor's degree in art and design from Jiangxi University of Finance and Economics (2008), is currently an associate professor and director of the General Teaching Center at Henan Vocational University of Science and Technology. He served as the deputy director and judge of the 5th National Digital Creative Teaching Skills Competition. He has been published in more than 10 international and domestic publications. His areas of interest include image style design and image processing.