• ๋Œ€ํ•œ์ „๊ธฐํ•™ํšŒ
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • ํ•œ๊ตญ๊ณผํ•™๊ธฐ์ˆ ๋‹จ์ฒด์ด์—ฐํ•ฉํšŒ
  • ํ•œ๊ตญํ•™์ˆ ์ง€์ธ์šฉ์ƒ‰์ธ
  • Scopus
  • crossref
  • orcid

  1. (Department of Electronics and Computer Engineering, Seokyeong University, Korea.)



Deep learning, Colorization, Super-resolution, Image restoration, Decoupled Loss

1. ์„œ ๋ก 

์ด๋ฏธ์ง€ ๋ณต์›(image restoration)์€ ์˜์ƒ ์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ ํ’ˆ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ , ํ›ผ์†๋œ ์ •๋ณด๋ฅผ ๋ณต์›ํ•˜๋Š” ์ž‘์—…์ด๋‹ค. ์ดฌ์˜ ํ™˜๊ฒฝ์˜ ๋ฌธ์ œ๋กœ ๋ถ€์กฑํ•œ ํ’ˆ์งˆ์„ ๊ฐ€์ง„ ์ด๋ฏธ์ง€๋‚˜, ๋””์ง€ํ„ธ ํ’ํ™”๋กœ ์ธํ•ด ์••์ถ•, ํ›ผ์†๋œ ์ด๋ฏธ์ง€๋ฅผ ๋ณต์›ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์˜์ƒ์˜ ๊ฐ€์น˜๋ฅผ ๋†’์ด๊ณ , ํ›ผ์†๋œ ์ž๋ฃŒ๋ฅผ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋‹ค.

์˜์ƒ์ฑ„์ƒ‰(colorization)์€ ํ‘๋ฐฑ ์‚ฌ์ง„์„ ์ƒ‰์ƒ ์‚ฌ์ง„์œผ๋กœ ์ „ํ™˜ํ•˜๋Š” ๊ธฐ์ˆ ์ด๋‹ค. ๊ธฐ์กด์˜ ์ „ํ†ต์ ์ธ ์˜์ƒ์ฑ„์ƒ‰ ๊ธฐ๋ฒ•๋“ค์€ ์‚ฌ์šฉ์ž๊ฐ€ ์ผ๋ถ€ ์ง€์—ญ์˜ ์ƒ‰์ƒ์„ ์•Œ๋ ค์ฃผ๊ฑฐ๋‚˜(1), ์˜ˆ์‹œ ์‚ฌ์ง„์„ ๋ณด์—ฌ์ฃผ๋Š” ๋“ฑ(2), ํžŒํŠธ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•๋“ค์ด ๋†’์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹์ด ๋ฐœ์ „ํ•˜๋ฉด์„œ ๋” ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ๋†’์€ ์„ฑ๋Šฅ์˜ ๊ธฐ๋ฒ•๋“ค์ด ์ œ์•ˆ๋˜์—ˆ๊ณ (3), ํžŒํŠธ ์—†์ด๋„ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ๋ชจ๋ธ๋“ค๋„ ์ œ์•ˆ๋˜์—ˆ๋‹ค(4,5).

์ดˆํ•ด์ƒํ™”(super-resolution)๋Š” ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ํ•ด์ƒ๋„๋ฅผ ๋†’์ด๋Š” ๊ธฐ์ˆ ์ด๋‹ค. ์ „ํ†ต์ ์ธ ๊ธฐ๋ฒ•์—์„œ๋Š” ๋‹ค์–‘ํ•œ ๋ณด๊ฐ„๋ฒ•์„ ํ™œ์šฉํ•ด ์ดˆํ•ด์ƒํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค(6,7). ๋”ฅ๋Ÿฌ๋‹์ด ๋ฐœ์ „ํ•จ์— ๋”ฐ๋ผ ์ดˆํ•ด์ƒํ™”์— CNN์„ ์ ์šฉํ•œ ๋ชจ๋ธ์ด ์ œ์•ˆ๋˜์—ˆ๊ณ (8), ์ตœ๊ทผ์—๋Š” ๋†’์€ ์„ฑ๋Šฅ์„ ๊ฐ€์ง„ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•๋“ค์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค(9).

๊ณผ๊ฑฐ ์˜์ƒ ์ค‘ ์ €ํ™”์งˆ ํ‘๋ฐฑ ์˜์ƒ์ด ์กด์žฌํ•˜๋Š”๋ฐ, ์ดˆํ•ด์ƒํ™”์™€ ์˜์ƒ์ฑ„์ƒ‰์„ ๋ชจ๋‘ ์ˆ˜ํ–‰ํ•˜๋Š” ์—ฐ๊ตฌ๋Š” ๊ฑฐ์˜ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋˜ํ•œ, ๋‘ ๋ถ„์•ผ์˜ ์˜์ƒ์ฒ˜๋ฆฌ ๋ชจ๋ธ์„ ๋‹จ์ˆœํ•˜๊ฒŒ ์ˆœ์ฐจ ์ ์šฉํ•˜๋ฉด ์˜ค์ฐจ ๋ˆ„์ ์œผ๋กœ ์ธํ•ด ๋ถ€์ž์—ฐ์Šค๋Ÿฌ์šด ์ด๋ฏธ์ง€๊ฐ€ ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ์ˆœ์ฐจ ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ์žฌํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋ฉด ๊ฐ ๋ชจ๋ธ์˜ ์—ญํ• ์ด ์—ดํ™” ๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์˜์ƒ์ฑ„์ƒ‰ ๋ฐ ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ์— ๋Œ€ํ•ด์„œ ๊ฐ๊ฐ์˜ ์—ญํ• ์„ ์œ ์ง€ํ•˜๋ฉฐ ์ƒํ˜ธ ๋…๋ฆฝ์ ์œผ๋กœ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ถ„๋ฆฌ๋œ ์†์‹คํ•จ์ˆ˜(Decoupled Loss) ๊ธฐ๋ฐ˜์˜ ์ˆœ์ฐจ ๋ชจ๋ธ ํ•™์Šต๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์˜์ƒ์ฑ„์ƒ‰๊ณผ ์ดˆํ•ด์ƒํ™”์˜ ๊ฐ ์ฒ˜๋ฆฌ๊ณผ์ •์— ์ ํ•ฉํ•œ ์†์‹คํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•˜๊ณ , ์ฃผ์š” ์†์‹คํ•จ์ˆ˜๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ๊ฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ƒํ˜ธ ๋…๋ฆฝ์ ์œผ๋กœ ์œ ์ง€ํ•˜๋„๋ก ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. EDSR-64(10)์™€ BigColor(11) ๋ชจ๋ธ์„ ๋ฒ ์ด์Šค ๋ผ์ธ์œผ๋กœ ํ•˜์—ฌ, ๋ฏธ์„ธ ์กฐ์ •์„ ์ˆ˜ํ–‰ํ•œ ๊ธฐ๋ณธ์  ์ˆœ์ฐจ ๋ชจ๋ธ๊ณผ ์ œ์•ˆ๋œ ๋ถ„๋ฆฌ์  ์ˆœ์ฐจ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•œ๋‹ค. DIV2K ๋ฐ ImageNet-1K ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ PSNR, SSIM๊ณผ FID(12) ์„ฑ๋Šฅ ์ง€ํ‘œ๋ฅผ ํ‰๊ฐ€ํ•œ๋‹ค.

2. ๋ถ„๋ฆฌ๋œ ์†์‹คํ•จ์ˆ˜ ๊ธฐ๋ฐ˜ ์ƒํ˜ธ ๋…๋ฆฝ์  ํ•™์Šต ๊ธฐ๋ฒ•

2.1 ์ „์ฒด ๊ตฌ์กฐ

๊ทธ๋ฆผ 1์—์„œ์™€ ๊ฐ™์ด ์ œ์•ˆ๋œ ํ•™์Šต๋ฒ•์€ ๋‘ ๊ฐœ์˜ ์ž…๋ ฅ๊ณผ ๊ฐ๊ฐ์˜ ์ถœ๋ ฅ์— ๋Œ€ํ•ด์„œ ๋ถ„๋ฆฌ๋œ ์†์‹คํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ž…๋ ฅ 2๋Š” ๋ชฉํ‘œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด 3์ฐจ ๋ณด๊ฐ„๋ฒ•์„ ํ†ตํ•ด ํ•ด์ƒ๋„๋ฅผ ๋‚ฎ์ถ˜ ์ด๋ฏธ์ง€์ด๋ฉฐ, ์ž…๋ ฅ 1์€ ์ž…๋ ฅ 2์— ๋Œ€ํ•ด์„œ ํ‘๋ฐฑ์œผ๋กœ ๋ณ€ํ™˜์‹œํ‚จ ์ด๋ฏธ์ง€๋‹ค. Network 1์€ ์˜์ƒ์ฑ„์ƒ‰ ๋ชจ๋ธ์ด๋ฉฐ, Network 2๋Š” ์ดˆํ•ด์ƒํ™” ๋ชจ๋ธ์ด๋‹ค. ์ž…๋ ฅ 1์€ ์ˆœ์ฐจ์ ์œผ๋กœ Network 1, 2๋ฅผ ํ†ต๊ณผํ•˜๊ณ  ๋ชฉํ‘œ ์ด๋ฏธ์ง€์™€ ์†์‹ค์„ ๊ณ„์‚ฐํ•˜๋ฉฐ, ์ด ๋•Œ ์†์‹คํ•จ์ˆ˜๋Š” GAN Loss(13)์™€ VGG Loss๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ž…๋ ฅ 2๋Š” Network 2๋งŒ์„ ํ†ต๊ณผํ•˜์—ฌ ๋ชฉํ‘œ ์ด๋ฏธ์ง€์™€ ์†์‹ค์„ ๊ณ„์‚ฐํ•˜๊ณ , ์ด ๋•Œ ์†์‹คํ•จ์ˆ˜๋Š” L1 Loss๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋ถ„๋ฆฌ๋œ ์†์‹คํ•จ์ˆ˜์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ 2.3์ ˆ์—์„œ ์†Œ๊ฐœํ•œ๋‹ค.

๊ทธ๋ฆผ. 1. ์ œ์•ˆ๋œ ํ•™์Šต๋ฒ• ๊ตฌ์„ฑ๋„

Fig. 1. Overall configuration of the proposed training method

../../Resources/kiee/KIEE.2023.72.3.434/fig1.png

2.2 ์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ

์˜์ƒ์ฑ„์ƒ‰ ๋ฐ ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ์˜ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•˜๋Š” ์ƒํ˜ธ ๋…๋ฆฝ์ ์ธ ํ•™์Šต๋ฒ•์„ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด ์˜์ƒ์ฑ„์ƒ‰, ์ดˆํ•ด์ƒํ™”, VGG, ํŒ๋ณ„์ž, ์ด 4๊ฐœ์˜ ์‚ฌ์ „ ํ•™์Šต๋œ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์˜์ƒ์ฑ„์ƒ‰ ๋ฐ ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ๋Š” ์ด๋ฏธ์ง€๋ฅผ ๋ณต์›ํ•˜๊ธฐ ์œ„ํ•ด, VGG ๋ฐ ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ๋Š” ์†์‹ค์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ๋‹ค.

๊ทธ๋ฆผ. 2. ์˜์ƒ์ฑ„์ƒ‰ ๋„คํŠธ์›Œํฌ ๊ตฌ์„ฑ๋„(BigColor)

Fig. 2. configuration of colorization network: BigColor

../../Resources/kiee/KIEE.2023.72.3.434/fig2.png

์˜์ƒ์ฑ„์ƒ‰ ๋„คํŠธ์›Œํฌ๋Š” BigGAN(14)์„ ์žฌํ•™์Šต์‹œํ‚จ BigColor ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ณ , ์ถ”๊ฐ€์ ์ธ ํ•™์Šต์€ ์ง„ํ–‰ํ•˜์ง€ ์•Š๋Š”๋‹ค. ๊ตฌ์กฐ๋„๋Š” ๊ทธ๋ฆผ 2์™€ ๊ฐ™๊ณ , ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ํ†ตํ•ด ํด๋ž˜์Šค ์ •๋ณด๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜์—ฌ ํ•ด๋‹น ๊ฐ์ฒด์— ๋Œ€ํ•ด ์ž์—ฐ์Šค๋Ÿฌ์šด ์ƒ‰์ƒ์„ ์„ ํƒํ•˜๋„๋ก ํ•™์Šตํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค. ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ๋Š” EDSR-64(10)์„ ์‚ฌ์šฉํ•˜๊ณ , ๋ฏธ์„ธ ์กฐ์ •์„ ํ†ตํ•ด ์˜์ƒ์ฑ„์ƒ‰ ๋„คํŠธ์›Œํฌ์™€ ์ƒํ˜ธ ๊ฐ„์„ญ ์—†์ด ๋…๋ฆฝ์ ์ธ ์—ญํ• ์„ ํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค. ๊ตฌ์กฐ๋„๋Š” ๊ทธ๋ฆผ 3๊ณผ ๊ฐ™๊ณ , 16๊ฐœ์˜ ์ž”์ฐจ ๋ธ”๋ก์„ ํฌํ•จํ•˜๋ฉฐ ๋ชจ๋ธ๋ช…์˜ 64๋Š” ๊ฐ ์ปจ๋ณผ๋ฃจ์…˜ ์ธต์˜ ์ฑ„๋„์„ ์˜๋ฏธํ•œ๋‹ค. ์‚ฌ์ „ ํ•™์Šต๋œ VGG ๋„คํŠธ์›Œํฌ๋Š” VGG Loss๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด(15), ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ๋Š” GAN Loss๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ๋‹ค. ์ด ๋•Œ, BigColor๊ฐ€ ์•„๋‹Œ BigGAN์—์„œ ์‚ฌ์šฉ๋œ ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ๊ฐ€ ์‚ฌ์šฉ๋˜๋Š”๋ฐ, ์ด๋Š” ์ด๋ฏธ์ง€ ๋ณต์› ๋ชจ๋ธ๋กœ ์‚ฌ์šฉํ•˜๋Š” BigColor ๋ชจ๋ธ์— ๊ณผ์ ํ•ฉ๋˜์–ด ๋ชจ๋“œ ๋ถ•๊ดด๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค.

๊ทธ๋ฆผ. 3. ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ ๊ตฌ์„ฑ๋„ (EDSR-64)

Fig. 3. configuration of super-resolution network (EDSR-64)

../../Resources/kiee/KIEE.2023.72.3.434/fig3.png

2.3 ๋ถ„๋ฆฌ๋œ ์†์‹คํ•จ์ˆ˜

์ผ๋ฐ˜์ ์ธ ์˜์ƒ์ฑ„์ƒ‰๊ณผ ์ดˆํ•ด์ƒํ™”์˜ ์ข…๋‹จ ํ•™์Šต์€ ๊ทธ๋ฆผ 1์—์„œ ์ƒ๋‹จ๊ณผ ์ค‘๋‹จ ๋ถ€๋ถ„๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ๋ชจ๋“  ์†์‹ค์„ ๊ณ„์‚ฐํ•˜๊ณ  ์—ญ์ „ํŒŒ๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์†์‹คํ•จ์ˆ˜๋Š” ์‹ (1)๊ณผ ๊ฐ™๋‹ค.

(1)
$Total loss =\sum_{i=0}^{n}L_{i}(N_{2}(N_{1}(x)),\: y)$

$N_{1}$, $N_{2}$๋Š” ๊ฐ๊ฐ Network 1, 2์ด๋ฉฐ, $x$, $y$, $L$, $n$์€ ๊ฐ๊ฐ ์ž…๋ ฅ, ๋ชฉํ‘œ, ์†์‹คํ•จ์ˆ˜, ์†์‹คํ•จ์ˆ˜์˜ ๊ฐœ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ ์ˆœ์ฐจ ์—ฐ๊ฒฐ ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•  ๊ฒฝ์šฐ, ์—ญ์ „ํŒŒ ๊ณผ์ • ์ค‘ ์ฑ„์ƒ‰๊ณผ ์ดˆํ•ด์ƒ ๋„คํŠธ์›Œํฌ์˜ ์†์‹ค์„ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์—†์–ด์„œ ๊ฐœ๋ณ„ ๋„คํŠธ์›Œํฌ์˜ ๊ณ ์œ  ์—ญํ• ์„ ์ œ๋Œ€๋กœ ์ˆ˜ํ–‰ํ•˜์ง€ ๋ชปํ•˜๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ๊ทธ๋ฆผ 1์˜ ํ•˜๋‹จ์˜ ์ ์„  ๋ถ€๋ถ„์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์†์‹คํ•จ์ˆ˜๋ฅผ ๋ถ„๋ฆฌ ์ ์šฉํ•œ๋‹ค.

์ด๋ฏธ์ง€ ๋ณต์›์— ์‚ฌ์šฉ๋˜๋Š” ๋Œ€ํ‘œ์ ์ธ ์†์‹คํ•จ์ˆ˜๋กœ L1 Loss, GAN Loss, ๊ทธ๋ฆฌ๊ณ  VGG Loss๊ฐ€ ์žˆ๋‹ค. L1 Loss๋Š” ๊ฐ ํ”ฝ์…€๊ฐ„์˜ ๊ฑฐ๋ฆฌ์„ ์ค„์—ฌ์ฃผ๋Š” ์†์‹คํ•จ์ˆ˜๋‹ค. L1 Loss์˜ ์‹์€ (2)๊ณผ ๊ฐ™๋‹ค.

(2)
$L1 Loss =\dfrac{1}{hw}\sum_{i=1}^{h}\sum_{j=1}^{w}|p_{ij}-\hat p_{ij}|$

$h$, $w$๋Š” ๊ฐ๊ฐ ์ด๋ฏธ์ง€์˜ ๋†’์ด์™€ ๋„“์ด, $p_{ij}$๋Š” ๊ฐ ํ”ฝ์…€์„ ์˜๋ฏธํ•œ๋‹ค. ์ž์—ฐ๊ณ„ ์ƒ‰์ƒ์˜ ํŽธ์ฐจ๊ฐ€ ์ ์€ ๊ฐ์ฒด์— ๋Œ€ํ•ด์„œ๋Š” ๋ฌธ์ œ๊ฐ€ ์—†์ง€๋งŒ, ํŽธ์ฐจ๊ฐ€ ํฐ ๊ฐ์ฒด์— ๋Œ€ํ•ด์„œ๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์‚ฌ๊ณผ๋ฅผ ๋ณด๊ณ  L1 Loss๋ฅผ ํ†ตํ•ด ํ•™์Šตํ•˜๋ฉด, ๋นจ๊ฐ„์ƒ‰(R)๊ณผ ์ดˆ๋ก์ƒ‰(G)์‚ฌ์ด์—์„œ ํ˜ผ๋ž€์„ ์ผ์œผํ‚ฌ ์ˆ˜ ์žˆ๊ณ , ์†์‹คํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋‚ฎ์€ ์ฑ„๋„์˜ ์ƒ‰์ƒ์œผ๋กœ ํšŒ๊ท€ํ•˜๋„๋ก ํ•™์Šต๋  ์ˆ˜ ์žˆ๋‹ค. GAN Loss๋Š” ์ด๋ฏธ์ง€์˜ ๊ฐœ์—ฐ์„ฑ์„ ์ฆ๊ฐ€์‹œ์ผœ ์ž์—ฐ์Šค๋Ÿฌ์šด ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค๋„๋ก ํ•˜๋Š” ์†์‹คํ•จ์ˆ˜๋‹ค. GAN Loss์˜ ์‹์€ (3) ๋ฐ (4)๊ณผ ๊ฐ™๋‹ค.

(3)
$GAN Loss_{D}= E_{(x_{t},\: c_{t})\sim p_{data(0:T)}}[\min(0,\: 1-D(y_{t},\: c_{t})]$$+ E_{y_{t}\sim N ,\: c_{t}\sim p_{data(0:T)}}[\min(0,\: 1+D(G(x_{t},\: c_{t}),\: c_{t})]$

(4)
$GAN Loss_{G}=- E_{y_{t}\sim p_{z},\: c_{t}\sim p_{data(0:T)}}D(G(x_{t},\: c_{t}),\: c_{t})$

$D$๋Š” ํŒ๋ณ„์ž, $G$๋Š” ์ด๋ฏธ์ง€ ๋ณต์› ๋„คํŠธ์›Œํฌ, $c$๋Š” ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์˜๋ฏธ ํ•œ๋‹ค ํ”ฝ์…€๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ณด๋‹ค๋Š” ์ด๋ฏธ์ง€๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ํ˜„์‹ค์„ฑ ์žˆ๋Š๋ƒ์— ์ง‘์ค‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž์—ฐ์Šค๋Ÿฌ์šด ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด์•ผํ•˜๋Š” ์˜์ƒ์ฑ„์ƒ‰์— ์ ํ•ฉํ•œ ์†์‹คํ•จ์ˆ˜๋‹ค. VGG Loss๋Š” VGG ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ต๊ณผ์‹œ์ผœ ๋‚˜์˜ค๋Š” ํŠน์ง•๋งต ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ค„์—ฌ์ฃผ๋Š” ์†์‹คํ•จ์ˆ˜๋กœ์จ ์‹ (5)์™€ ๊ฐ™๋‹ค.

(5)
$VGG Loss=\dfrac{1}{C_{j}W_{j}H_{j}}\vert\vert \phi_{j}(G(x,\: c))-\phi_{j}(y)\vert\vert ^{2}$

$\phi_{j}$๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ $j$๋ฒˆ์งธ ๊ณ„์ธต๊นŒ์ง€์˜ VGG ๋„คํŠธ์›Œํฌ์ด๋ฉฐ, $C_{j},\: W_{j},\: H_{j}$๋Š” ์ˆœ์„œ๋Œ€๋กœ VGG ๋„คํŠธ์›Œํฌ์˜ $j$๋ฒˆ์งธ ๊ณ„์ธต ์ถœ๋ ฅ ํŠน์ง• ๋งต์— ๋Œ€ํ•œ ์ฑ„๋„, ๋„“์ด, ๋†’์ด๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ํ”ฝ์…€๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ณด๋‹ค๋Š” ์‚ฌ๋žŒ์ด ๋Š๋ผ๋Š” ๊ตฌ์กฐ์ ์ธ ํŠน์„ฑ(์—ฃ์ง€, ์งˆ๊ฐ ๋“ฑ)์ด๋‚˜ ์ถ”์ƒ์ ์ธ ์˜๋ฏธ์— ์ง‘์ค‘ํ•˜๋Š” ์†์‹คํ•จ์ˆ˜์ด๊ธฐ ๋•Œ๋ฌธ์—, GAN Loss์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์˜์ƒ์ฑ„์ƒ‰์— ์ ํ•ฉํ•˜๋‹ค. VGG16 ๋„คํŠธ์›Œํฌ์—์„œ, ์ธต ๋ฒˆํ˜ธ $j$๊ฐ€ ์ž‘์„์ˆ˜๋ก ์—ฃ์ง€์™€ ๊ฐ™์€ ๋‚ฎ์€ ์ˆ˜์ค€์˜ ํŠน์ง•์— ์ง‘์ค‘ํ•˜๊ณ , ํด์ˆ˜๋ก ์ถ”์ƒ์ ์ด๊ณ  ์˜๋ฏธ๋ก ์ ์ธ ๋†’์€ ์ˆ˜์ค€์˜ ํŠน์ง•์— ์ง‘์ค‘ํ•œ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ˆœ์ฐจ์ ์ธ ๋„คํŠธ์›Œํฌ์—์„œ ์œ„์™€ ๊ฐ™์€ ๊ฐ ์†์‹คํ•จ์ˆ˜์˜ ์—ญํ• ์ด ๊ฐ„์„ญ์—†์ด ๋…๋ฆฝ์ ์œผ๋กœ ์ˆ˜ํ–‰๋˜์–ด, ๊ฐ๊ฐ์˜ ๋„คํŠธ์›Œํฌ์˜ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•˜๋Š” ํ•™์Šต๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ทธ๋ฆผ 1์—์„œ์—์„œ ๋‘ ๊ฐ€์ง€ ๋„คํŠธ์›Œํฌ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ํ†ต๊ณผํ•˜๋Š” ์ž…๋ ฅ 1์˜ ์ถœ๋ ฅ์— ๋Œ€ํ•ด์„œ๋Š” ์˜์ƒ์ฑ„์ƒ‰์— ์ ํ•ฉํ•œ GAN Loss์™€ VGG Loss๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ๊ฐ€ ์˜์ƒ์ฑ„์ƒ‰ ๋„คํŠธ์›Œํฌ๋ฅผ ๋ณด์™„ํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค. ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ๋งŒ ํ†ต๊ณผํ•˜๋Š” ์ž…๋ ฅ 2์˜ ์ถœ๋ ฅ ๋Œ€ํ•ด์„œ๋Š” ์ดˆํ•ด์ƒํ™”์— ์ ํ•ฉํ•œ L1 Loss๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ์˜ ๋ณธ๋ž˜ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค. ์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด (6), (7) ๋ฐ (8)๊ณผ ๊ฐ™๋‹ค.

(6)
$Loss_{1}=\sum_{i=0}^{n}L_{i}(N_{2}(N_{1}(x_{1})),\: y)$

(7)
$Loss_{2}=\sum_{j=0}^{m}L_{j}(N_{2}(x_{2}),\: y)$

(8)
$Total loss =Loss_{1}+Loss_{2}$

$N_{1}$, $N_{2}$๋Š” ๊ฐ๊ฐ Network 1, 2์ด๋ฉฐ, $x$, $y$, $L$์€ ๊ฐ๊ฐ ์ž…๋ ฅ, ๋ชฉํ‘œ, ์†์‹คํ•จ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค. $n$๊ณผ $m$์€ ๊ฐ๊ฐ $Loss_{1}$๊ณผ $Loss_{2}$๋กœ ๋ถ„๋ฆฌ๋œ ์†์‹คํ•จ์ˆ˜์˜ ๊ฐœ์ˆ˜์ธ๋ฐ, ์—ฌ๊ธฐ์„œ $n$์€ GAN Loss๊ณผ VGG Loss๋กœ 2์ด๋ฉฐ, m์€ L1 Loss๋ฅผ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ 1์ด๋‹ค. ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ํ†ตํ•ด ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ ๋ณธ๋ž˜์˜ ์—ญํ• ์ด ๋ถ•๊ดด๋˜์ง€ ์•Š๊ณ , ์˜์ƒ์ฑ„์ƒ‰ ๋„คํŠธ์›Œํฌ๋ฅผ ๋ณด์™„ํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค.

3. ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ ๋ถ„์„

3.1 ์‹คํ—˜ ํ™˜๊ฒฝ

์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ํ•™์Šต๋œ ๋ชจ๋ธ์„ DIV2K, ImageNet-1k ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ PSNR, SSIM ๊ทธ๋ฆฌ๊ณ  FID ์ง€ํ‘œ๋ฅผ ํ†ตํ•ด ํ‰๊ฐ€ํ•˜๊ณ , ๊ฐœ๋ณ„๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ ๋ฐ ๋ฏธ์„ธ ์กฐ์ •์„ ํ†ตํ•ด ํ•™์Šต๋œ ๋ชจ๋ธ๊ณผ ๋น„๊ตํ•œ๋‹ค. ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ ์…‹์€ ํ‘œ 1๊ณผ ๊ฐ™๋‹ค. DIV2K๋Š” ๊ณ ํ™”์งˆ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ, 800์žฅ์˜ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ 100์žฅ์˜ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์œผ๋ฉฐ, ๊ณ ํ™”์งˆ์˜ ์„ธ๋ถ€์ ์ธ ํŠน์ง•๋“ค์„ ์–ผ๋งˆ๋‚˜ ์ž˜ ๋ณต์›ํ•˜๋Š”์ง€ ํŒ๋‹จํ•˜๊ธฐ์— ์ ํ•ฉํ•˜๋‹ค. ImageNet-1k๋Š” 1000๊ฐœ์˜ ํด๋ž˜์Šค๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ, 50000์žฅ์˜ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋‹ค์ˆ˜์˜ ํด๋ž˜์Šค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜์˜ ๊ฐ์ฒด์— ๋Œ€ํ•ด์„œ ์–ผ๋งˆ๋‚˜ ๊ฐ•์ธํ•˜๊ฒŒ ์ด๋ฏธ์ง€ ๋ณต์›์„ ํ•ด๋‚ด๋Š”์ง€ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค. ImageNet-1k๋Š” ๊ฒ€์ฆ์šฉ์œผ๋กœ๋งŒ ์‚ฌ์šฉํ•˜๊ณ , ํ›ˆ๋ จ์šฉ์œผ๋กœ๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค.

ํ‘œ 1. ์‚ฌ์šฉ ๋ฐ์ดํ„ฐ์…‹

Table 1. Dataset

Dataset

train

validation

DIV2K

800

100

ImageNet-1k

-

50000

PSNR(Peak Signal-to-Noise Ratio)์€ ํ”ฝ์…€๊ฐ„์˜ ๊ฑฐ๋ฆฌ์— ๋Œ€ํ•œ ์ œ๊ณฑํ‰๊ท ์— ๋ฐ˜๋น„๋ก€ํ•˜๋Š” ์ง€ํ‘œ์ด๋‹ค. SSIM(Structural Similarity Index Measure)์€ ๋ฐ๊ธฐ, ๋Œ€์กฐ, ๊ตฌ์กฐ์ ์ธ ์œ ์‚ฌ๋„์— ๋Œ€ํ•œ ์ˆ˜์น˜์ด๋‹ค. FID(Frรฉchet Inception Distance)๋Š” Inception-v3 ๋„คํŠธ์›Œํฌ(16)๋ฅผ ํ†ต๊ณผ์‹œ์ผœ์„œ ๋‚˜์˜จ ํŠน์ง•๋งต๋“ค์„ ์ด์šฉํ•˜๋Š” ๊ฑฐ๋ฆฌ๊ธฐ๋ฐ˜ ์ง€ํ‘œ๋กœ, ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ์ด๋ฏธ์ง€ ์ง‘ํ•ฉ๊ณผ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€ ์ง‘ํ•ฉ๊ฐ„์˜ ํŠน์ง• ๋ถ„ํฌ๋ฅผ ์ธก์ •ํ•œ๋‹ค. PSNR๊ณผ SSIM์€ ๋†’์„์ˆ˜๋ก ์ข‹์€ ์ง€ํ‘œ๊ณ , FID๋Š” ๋‚ฎ์„์ˆ˜๋ก ์ข‹์€ ์ง€ํ‘œ๋‹ค.

ํ•™์Šต์€ DIV2K ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ 512x512 random cropํ•˜์—ฌ ๋ชฉํ‘œ ์ด๋ฏธ์ง€๋กœ ํ•˜๊ณ , ์ดํ›„ 256x256์œผ๋กœ resizeํ•˜์—ฌ ์ž…๋ ฅ 2๋กœ ์‚ฌ์šฉํ•˜๊ณ , ํ‘๋ฐฑ ์ „ํ™˜์„ ์ถ”๊ฐ€๋กœ ์ ์šฉํ•˜์—ฌ ์ž…๋ ฅ 1๋กœ ์‚ฌ์šฉํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์œ„ํ•ด ๊ฐ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ horizontal flip, vertical flip, $90^{\circ}$rotation์„ ๊ฐ๊ฐ 0.5 ํ™•๋ฅ ๋กœ ์ ์šฉํ•œ๋‹ค. ๊ฒ€์ฆ์€ ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ๋ชฉํ‘œ์ด๋ฏธ์ง€๋กœ ํ•˜๊ณ , 2๋ถ„์˜ 1๋กœ ์ถ•์†Œํ•˜์—ฌ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •์€ ํ‘œ 2์™€ ๊ฐ™๋‹ค. optimizer๋Š” ๋ฏธ์„ธ ์กฐ์ •์— ์ ํ•ฉํ•œ SGD๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ ๋ชจ๋ฉ˜ํ…€์€ 0.9๋กœ ์„ค์ •ํ•œ๋‹ค. ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ์˜ ํ•™์Šต๋ฅ ์€ ์‚ฌ์ „ ํ•™์Šต ์ข…๋ฃŒ์— ๋งž์ถ”์–ด 0.00005๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ์˜ ํ•™์Šต๋ฅ ์€ BigColor์— ๋งž์ถ”์–ด 0.00003์œผ๋กœ ์„ค์ •ํ•œ๋‹ค. ์†์‹คํ•จ์ˆ˜ ๊ณ„์ˆ˜ ๋˜ํ•œ, BigColor์— ๋งž์ถ”์–ด L1 Loss๋Š” 1.0, VGG Loss๋Š” 0.2, GAN Loss๋Š” 0.03์„ ์ ์šฉํ•˜๊ณ , $j =1,\: 2,\: 13,\: 20$์— ๋Œ€ํ•˜์—ฌ VGG Loss๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. 200 epoch๊ฐ„ ํ•™์Šตํ•˜๋ฉฐ, $\lambda =0.995$์˜ ๋žŒ๋‹ค ํ•™์Šต๋ฅ  ๊ฐ์†Œ(lambda learning rate decay)๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. batch size๋Š” 8๋กœ ์„ค์ •ํ•˜๊ณ , ์‹คํ—˜์€ pytorch ํ™˜๊ฒฝ์—์„œ ์ˆ˜ํ–‰ํ•œ๋‹ค.

ํ‘œ 2. ํ•™์Šต ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ

Table 2. training hyper parameter

hyper parameter

value

size

input

256x256x1(Gray)

target

512x512x3(RGB)

augmentation

horizontal flip

vertical flip

$90^{\circ}$rotation

optimizer

SGD with momentum

learning rate

SR

5e-5

discriminator

3e-5

epoch

200

learning rate dacay

Lambda(0.995)

batch size

8

coefficient

L1

1.0

GAN

0.2

VGG

0.03

3.2 ์‹คํ—˜ ๊ฒฐ๊ณผ

DIV2K ๋ฐ์ดํ„ฐ ์…‹์— ๋Œ€ํ•ด์„œ ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•๊ณผ ๊ธฐ์กด ๊ธฐ๋ฒ•์„ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ๊ฐ€ ํ‘œ 3์— ๋‚˜์™€ ์žˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋ฒ ์ด์Šค ๋ผ์ธ์€ ์ „์ž์ฑ„์ƒ‰๊ณผ ์ดˆํ•ด์ƒํ™”์— ๋Œ€ํ•ด์„œ ๊ฐœ๋ณ„ ํ•™์Šต๋œ ๋‘ ๋ชจ๋ธ์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•œ ๋ฐฉ์‹์˜ ๊ฒฐ๊ณผ์ด๋‹ค. ์ œ์•ˆ ๊ธฐ๋ฒ•์˜ PSNR, SSIM, FID๊ฐ€ ๊ฐ๊ฐ 0.18, 0.003, 0.160 ๋งŒํผ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค. ์˜์ƒ์ฑ„์ƒ‰์— ๋Œ€ํ•ด ๊ฐ€์žฅ ์ง์ ‘์ ์ธ ์ง€ํ‘œ์ธ FID๊ฐ€ ๊ฐ€์žฅ ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜์—ˆ๋Š”๋ฐ, ์ด๋Š” ์šฐ๋ฆฌ์˜ ์˜๋„๋Œ€๋กœ ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ๊ฐ€ ์˜์ƒ์ฑ„์ƒ‰ ๋„คํŠธ์›Œํฌ์— ๋Œ€ํ•ด ๋ณด์™„์ ์ธ ์—ญํ• ์„ ํ•˜๋ฉด์„œ๋„ ๊ธฐ์กด ์—ญํ• ์ธ ์ดˆํ•ด์ƒํ™”๋ฅผ ์ž˜ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ•™์Šต ๋˜์—ˆ๋‹ค๋Š” ์˜๋ฏธ๋‹ค. ๊ธฐ์กด์˜ ๊ธฐ๋ฒ•์„ ์ ์šฉํ•œ ๊ฒฝ์šฐ์—๋Š” PSNR ์„ฑ๋Šฅ์€ ๊ฐœ์„ ๋˜์—ˆ์ง€๋งŒ, SSIM๊ณผ FID์„ฑ๋Šฅ์€ ์žฌํ•™์Šต ์ด์ „๊ณผ ๋น„๊ตํ•ด ์˜คํžˆ๋ ค ๋‚ฎ์•„์ง€๋Š” ๋ชจ์Šต์„ ๋ณด์ธ๋‹ค. ์ด๋Š” ์žฌํ•™์Šต ๊ณผ์ •์—์„œ ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ๊ฐ€ ๋ถ•๊ดด๋˜์–ด ๊ตฌ์กฐ์ , ์˜๋ฏธ๋ก ์ ์ธ ๋ถ€๋ถ„์— ๋Œ€ํ•œ ๋ณต์› ์„ฑ๋Šฅ์ด ์•ฝํ™”๋˜์—ˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค.

ํ‘œ 3. ์„ฑ๋Šฅ๋น„๊ต (DIV2K ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์…‹)

Table 3. Perormance Comparison (DIV2K validation dataset)

Method

PSNR(โ†‘)

SSIM(โ†‘)

FID(โ†“)

baseline

20.30

0.815

0.756

baseline +

conventional fine-tuning

20.99

0.771

0.884

baseline +

decoupled Loss(ours)

20.48

0.818

0.596

ImageNet-1k ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•œ ์‹คํ—˜๊ฒฐ๊ณผ์ธ ํ‘œ 4์—์„œ๋„ FID์— ๋Œ€ํ•ด์„œ ๋†’์€ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์˜€๊ณ , PSNR๊ณผ SSIM ์„ฑ๋Šฅ๋„ ๊ฐœ์„ ๋˜์—ˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๋Š” ์ด๋ฏธ์ง€์˜ ๊ตฌ์กฐ์ ์ธ ํŠน์ง• ๋ณต์› ๋Šฅ๋ ฅ์ด ๊ฐœ์„ ๋˜์—ˆ๊ณ , ๋‹ค์–‘ํ•œ ๊ฐ์ฒด์— ๋Œ€ํ•ด์„œ ๊ฐ•์ธํ•œ ๋ณต์› ๋Šฅ๋ ฅ์„ ํ•™์Šตํ–ˆ์Œ์„ ๋ณด์ธ๋‹ค.

ํ‘œ 4. ๊ธฐ์กด ๊ธฐ๋ฒ•๊ณผ์˜ ์„ฑ๋Šฅ๋น„๊ต(ImageNet-1k ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์…‹)

Table 4. Comparison with baseline(ImageNet-1k validation dataset)

Method

PSNR(โ†‘)

SSIM(โ†‘)

FID(โ†“)

baseline

19.56

0.734

0.600

baseline +

conventional fine-tuning

20.16

0.698

0.806

baseline +

decoupled Loss(ours)

19.65

0.752

0.390

๊ทธ๋ฆผ 4์˜ (e), (f)๋Š” ์ œ์•ˆ๋œ ๊ธฐ๋ฒ• ์—†์ด ๋ฏธ์„ธ ์กฐ์ •ํ•œ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๋กœ, ๊ฐœ๋ณ„ ํ•™์Šต๋œ ๋ฒ ์ด์Šค ๋ผ์ธ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ์ธ (c), (d)์™€ ๋น„๊ตํ•˜์—ฌ ์ƒ‰์„ฑ๋ถ„์ด ๊ณผ๋„ํ•˜๊ฒŒ ์–ต์ œ๋œ ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค. L1 Loss๋Š” ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ํŠน์ง•๋“ค์ด ๊ฐ€์งˆ๋งŒํ•œ ์ƒ‰์ƒ์˜ ํ‰๊ท ์œผ๋กœ ํšŒ๊ท€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์˜์ƒ์ฑ„์ƒ‰ ๋„คํŠธ์›Œํฌ ํ›ˆ๋ จ์— ๋Œ€ํ•ด์„œ ํ•ด๋‹น ์†์‹คํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ƒ‰์„ฑ๋ถ„์ด ์—ดํ™” ๋˜๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์ด์œ ๋กœ ํ”ฝ์…€๊ฐ„์˜ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ์†์‹ค์ด ์ตœ์†Œํ™” ๋˜๋„๋ก ํ•™์Šต๋œ ๋ชจ๋ธ์€ PSNR ์ˆ˜์น˜๋Š” ๊ฐœ์„ ๋  ์ˆ˜ ์žˆ์—ˆ์ง€๋งŒ, ๊ตฌ์กฐ์ , ์˜๋ฏธ๋ก ์ ์ธ ํŠน์ง•๋“ค์— ๋Œ€ํ•œ ๋ณต์› ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๊ฒŒ ๋˜์–ด SSIM๊ณผ FID ์ง€ํ‘œ์— ๋Œ€ํ•ด ๋‚ฎ์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๊ทธ์— ๋ฐ˜ํ•ด ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์˜ ๊ฒฐ๊ณผ๋ฌผ์ธ (g)์™€ (h)๋ฅผ๋ณด๋ฉด ์ƒ‰์„ฑ๋ถ„์˜ ์–ต์ œ ์—†์ด ์ž์—ฐ์Šค๋Ÿฌ์šด ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค๋„๋ก ํ•™์Šต๋œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ, (g)๋Š” ๊ณผ๋„ํ•œ ์ƒ‰์„ฑ๋ถ„๊ณผ ์˜ค์ฐจ ๋ˆ„์ ์œผ๋กœ ์™œ๊ณก์ด ๋ฐœ์ƒํ–ˆ์ง€๋งŒ, ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์œผ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์€ ์ƒํ˜ธ ๋…๋ฆฝ์ ์œผ๋กœ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ด (g)์ฒ˜๋Ÿผ ์ž์—ฐ์Šค๋Ÿฌ์šด ์ด๋ฏธ์ง€๋ฅผ ๋ณต์›ํ•œ๋‹ค.

๊ทธ๋ฆผ. 4. ๊ธฐ๋ฒ• ์ ์šฉ ๊ฒฐ๊ณผ. ์œ„์—์„œ๋ถ€ํ„ฐ ์ž…๋ ฅ, ์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ, ๋‹จ์ˆœ ๋ฏธ์„ธ ์กฐ์ •, ๋ถ„๋ฆฌ๋œ ์†์‹คํ•จ์ˆ˜ ์‚ฌ์šฉ(ours).

Fig. 4. qualitative evaluation. From the top, input, baseline, conventional fine-tuning, fine-tuning with decoupled Loss(ours).

../../Resources/kiee/KIEE.2023.72.3.434/fig4.png

๊ทธ๋ฆผ. 5. ๊ธฐ์กด์˜ ์žฌํ•™์Šต๊ณผ ์ œ์•ˆ๋œ ๊ธฐ๋ฒ• ์„ธ๋ถ€ ๋น„๊ต

Fig. 5. detailed comparison of ours and conventional fine-tuning

../../Resources/kiee/KIEE.2023.72.3.434/fig5.png

๊ทธ๋ฆผ 5์€ ์ƒ˜ํ”Œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ์ผ๋ถ€๋ถ„์„ ํ™•๋Œ€ํ•œ ์‚ฌ์ง„์ด๋‹ค. (a)๋Š” ์ผ๋ฐ˜์ ์ธ ์žฌํ•™์Šต์„ ์ˆ˜ํ–‰ํ•œ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์ด๊ณ  (c)๋Š” ๊ทธ์— ๋Œ€ํ•œ ํ™•๋Œ€ ์‚ฌ์ง„์ด๋ฉฐ, (b)๋Š” ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์žฌํ•™์Šตํ•œ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์ด๊ณ  (d)๋Š” ๊ทธ์— ๋Œ€ํ•œ ํ™•๋Œ€ ์‚ฌ์ง„์ด๋‹ค. (c)๋ฅผ ๋ณด๋ฉด โ€˜์ฒด์ปค๋ณด๋“œ ์•„ํ‹ฐํŒฉํŠธโ€™๋ผ ๋ถˆ๋ฆฌ๋Š” ๊ฒฉ์ž์  ๋ฐœ์ƒ ํ˜„์ƒ์ด ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ด๋Š” ์žฌํ•™์Šต ๊ณผ์ •์—์„œ ์ดˆํ•ด์ƒํ™” ๋„คํŠธ์›Œํฌ์˜ ๋ณธ๋ž˜ ๊ธฐ๋Šฅ์ด ์—ดํ™” ๋˜์–ด ๋ฐœ์ƒํ•œ ๋ฌธ์ œ์ด๋‹ค. ๊ทธ์— ๋ฐ˜ํ•ด, ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ํ•™์Šตํ•œ ๋ชจ๋ธ์€ ์ฒด์ปค๋ณด๋“œ ์•„ํ‹ฐํŒฉํŠธ ์—†์ด ๊น”๋”ํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์ด ๊ฐ ๋„คํŠธ์›Œํฌ๊ฐ„์˜ ์ƒํ˜ธ ๊ฐ„์„ญ ์—†์ด ๋…๋ฆฝ์ ์ธ ๊ธฐ๋Šฅ์„ ์œ ์ง€ํ•˜๋„๋ก ํ•™์Šต์‹œ์ผœ, ๊ฐ ๋„คํŠธ์›Œ๋ฏ€์˜ ํŠน์„ฑ์„ ๋ณด์กดํ•จ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

4. ๊ฒฐ ๋ก 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์˜์ƒ์ฑ„์ƒ‰๊ณผ ์ดˆํ•ด์ƒํ™”์˜ ๋‘ ๊ฐœ์˜ ์ˆœ์ฐจ ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ์†์‹คํ•จ์ˆ˜๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ๊ฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ์ €ํ•˜ ์—†์ด ์ข…๋‹จ ํ•™์Šตํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์˜์ƒ์ฑ„์ƒ‰๊ณผ ์ดˆํ•ด์ƒํ™”์˜ ๊ฐ ์ฒ˜๋ฆฌ๊ณผ์ •์— ์ ํ•ฉํ•œ ์†์‹คํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•˜๊ณ , ์ฃผ์š” ์†์‹คํ•จ์ˆ˜๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ, ์ถ”๊ฐ€์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ ์—†์ด ๊ฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ƒํ˜ธ ๊ฐ„์„ญ ์—†์ด ๋…๋ฆฝ์ ์œผ๋กœ ์œ ์ง€ํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค. DIV2K ๋ฐ ImageNet-1K ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ ์ œ์•ˆ๋œ ๋ถ„๋ฆฌ์  ์ˆœ์ฐจ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๊ธฐ์กด ๊ธฐ๋ฒ•๋“ค๊ณผ ๋น„๊ตํ•˜์—ฌ PSNR, SSIM๊ณผ FID ๋“ฑ์˜ ์ฃผ์š” ์„ฑ๋Šฅ ์ง€ํ‘œ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ์–ป์—ˆ๋‹ค. ํ–ฅํ›„, ์†์‹คํ•จ์ˆ˜์˜ ์ •๊ตํ™”์™€ ํ•™์Šต ๊ตฌ์กฐ์˜ ๊ฐœ์„ ์„ ํ†ตํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ํ•„์š”ํ•˜๋‹ค.

References

1 
Y. C. Huang, Y. S. Tung, J. C. Chen, S. W. Wang, J. L. Wu, 2005, An adaptive edge detection based colorization algorithm and its applications, In Proceedings of the 13th annual ACM international conference on Multimedia, pp. 351-354DOI
2 
A. Y.-S. Chia, S. Zhuo, R. K. Gupta, Y.-W. Tai, S.-Y. Cho, P. Tan, S. Lin, 2011, Semantic colorization with internet images, in ACM Transactions on Graphics, Vol. 30, No. 6, pp. 156DOI
3 
R. Zhang, J.-Y. Zhu, P. Isola, X. Geng, A. S. Lin, T. Yu, A. A. Efros, 2017, Real-time user-guided image colorization with learned deep priors, ACM Transactions on Graphics (TOG), Vol. 36, No. 4, pp. 1-11DOI
4 
S. Jheng-Wei, C. Hung-Kuo, H. Jia-Bin, 2020, Instance-aware image colorization, in IEEE conference on computer vision and pattern recognition, pp. 7968-7977Google Search
5 
Y. Wu, X. Wang, Y. Li, H. Zhang, X. Zhao, Y. Shan, 2021, Towards vivid and diverse image colorization with generative color prior, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14377-14386Google Search
6 
R. Timofte, V. De Smet, L. Van Gool, 2013, Anchored neighborhood regression for fast example-based super-resolution, In Proceedings of the IEEE international conference on computer vision, pp. 1920-1927Google Search
7 
K.I. Kim, Y. Kwon, 2010, Single-image super-resolution using sparse regression and natural image prior, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 6, pp. 1127-1133DOI
8 
C. Dong, C.C. Loy, K. He, X. Tang, 2014, Learning a deep convolutional network for image super-resolution., in European conference on computer vision, pp. 184-199DOI
9 
J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, R. Timofte, 2021, Swinir: Image restoration using swin transformer, in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833-1844Google Search
10 
B. Lim, S. Son, H. Kim, S. Nah, K. Mu Lee, 2017, Enhanced deep residual networks for single image super-resolution, in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136-144Google Search
11 
G. Kim, K. Kang, S. Kim, H. Lee, S. Kim, J. Kim, S. Baek, S. Cho, 2022, BigColor: Colorization using a Generative Color Prior for Natural Images, In European Conference on Computer Vision, pp. 350-366Google Search
12 
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, 2017, Gans trained by a two time-scale update rule converge to a local nash equilibrium, Advances in neural information processing systems, Vol. 30Google Search
13 
J. H. Lim, J. C. Ye, 2017, Geometric gan, arXiv preprint arXiv:1705.02894Google Search
14 
A. Brock, J. Donahue, K. Simonyan, 2018, Large scale GAN training for high fidelity natural image synthesis, arXiv preprint arXiv:1809.11096Google Search
15 
K. Simonyan, A. Zisserman, 2014, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409 1556Google Search
16 
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, 2016, Rethinking the inception architecture for computer vision, In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826Google Search

์ €์ž์†Œ๊ฐœ

๊ถŒ์ˆœ์šฉ (SoonYong Gwon)
../../Resources/kiee/KIEE.2023.72.3.434/au1.png

He received BS degree from Electronics Engineering from Seokyong University, Seoul, Korea, in 2023.

He is currently pursuing his MS degree in Electronics and Computer Engineering at Seokyeong University, His research interests include deep learning, computer vision.

์„œ๊ธฐ์„ฑ (Kisung Seo)
../../Resources/kiee/KIEE.2023.72.3.434/au2.png

He received the BS, MS, and Ph.D degrees in Electrical Engineering from Yonsei University, Seoul, Korea, in 1986, 1988, and 1993 respectively.

He joined Genetic Algorithms Research and Applications Group (GARAGe), Michigan State University from 1999 to 2002 as a Research Associate.

He was also appointed Visiting Assistant Professor in Electrical &Computer Engineering, Michigan State University from 2002 to 2003.

He was a Visiting Scholar at BEACON (Bio/computational Evolution in Action CONsortium) Center, Michigan State University from 2011 to 2012.

He is currently Professor of Electronics Engineering, Seokyeong University. His research interests include deep learning, evolutionary computation, computer vision, and intelligent robotics.