Mobile QR Code QR CODE : The Transactions P of the Korean Institute of Electrical Engineers

  1. (Department of Informatics, Gyeongsang National University, Korea.)



Machine Learning, Generative Adversarial Network, Line Arts Colorization, Image Generation

1. ์„œ ๋ก 

์„ ํ™”๋Š” ์Šคํ† ๋ฆฌ๋ณด๋“œ, ๊ฒŒ์ž„, ์‚ฝํ™”, ์• ๋‹ˆ๋ฉ”์ด์…˜ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฏธ๋””์–ด ์‚ฐ์—… ์ดˆ๊ธฐ ๋‹จ๊ณ„์— ์ž‘ํ’ˆ ๋ฐฉํ–ฅ์„ ์ •ํ•˜๋Š”๋ฐ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ์• ๋‹ˆ๋ฉ”์ด์…˜๊ณผ ๊ฐ™์€ ๋ฏธ๋””์–ด ์‚ฐ์—…์—์„œ ์ œํ’ˆํ™” ์ „ ์ฝ˜ํ‹ฐ(๋˜๋Š” ์Šคํ† ๋ฆฌ๋ณด๋“œ)์™€ ๊ฐ™์ด ํŽœ ํ„ฐ์น˜๋งŒ์„ ์‚ฌ์šฉํ•œ ์ฝ˜ํ‹ฐ๋กœ๋Š” ์ƒ‰, ๋ถ„์œ„๊ธฐ ๋“ฑ์„ ์ „๋‹ฌํ•˜๊ธฐ ์‰ฝ์ง€ ์•Š์•„ ์ถ”๊ฐ€ ์ ์ธ ์ปฌ๋Ÿฌ ํŽœ ํ„ฐ์น˜๋‚˜ ์ปฌ๋Ÿฌ ์ฝ˜ํ‹ฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ์„ ํ™”๋ฅผ ์ฑ„์ƒ‰ํ•˜๋Š” ์ผ์€ Photoshop, Clip studio ๋“ฑ ์ด๋ฏธ์ง€ ํŽธ์ง‘ ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋ฉฐ ๋‹ค์–‘ํ•œ ์ธต์„ ์•„ํ‹ฐ์ŠคํŠธ๊ฐ€ ์กฐ์ž‘ํ•˜์—ฌ ๋งŒ๋“ค์–ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋…ธ๋™์ง‘์•ฝ์ ์ด๋ฉฐ ์ง€๋ฃจํ•œ ๋ฐ˜๋ณต ์ž‘์—…์ด๋‹ค. ํŠนํžˆ ์˜์ƒ์œผ๋กœ ์ง„ํ–‰๋˜๋Š” ์• ๋‹ˆ๋ฉ”์ด์…˜ ์‚ฐ์—…์—์„œ๋Š” 90๋ถ„์˜ ์ƒ์˜ ์‹œ๊ฐ„์— ์ดˆ๋‹น 24ํ”„๋ ˆ์ž„์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ ์ด ๊ฒฝ์šฐ ์•ฝ 170,000์žฅ์˜ ๋™ํ™”(ํ”„๋ ˆ์ž„)๋ฅผ ์• ๋‹ˆ๋ฉ”์ดํ„ฐ๋“ค์ด ์ƒ‰์„ ์น ํ•ด์•ผ ํ•˜๋ฏ€๋กœ ๋งŽ์€ ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์ด ์†Œ๋น„๋œ๋‹ค. ๊ทธ ๋•Œ๋ฌธ์— ์ตœ๊ทผ GAN(Generative Adversarial Networks)(1)์„ ์‚ฌ์šฉํ•ด ์„ ํ™” ์ด๋ฏธ์ง€๋ฅผ ์ฑ„์ƒ‰ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜๊ณ  ์žˆ์œผ๋ฉฐ, Petalica Paint(2) ๋ฐ Clip Studido์™€ ๊ฐ™์€ ์ƒ์šฉํ™”๋œ ์ž๋™์ฑ„์ƒ‰ ๋„๊ตฌ๋“ค์ด ์ด๋Ÿฌํ•œ ์ž๋™์ฑ„์ƒ‰ ๊ธฐ๋Šฅ์„ ์ง€์›ํ•˜๋ ค๋Š” ์›€์ง์ž„์„ ๋ณด์ธ๋‹ค.

์„ ํ™” ์ฑ„์ƒ‰์— ์‚ฌ์šฉ๋˜๋Š” ์„ ํ™”์˜ ๊ฒฝ์šฐ ๊ทธ๋ ˆ์ด ์Šค์ผ€์ผ ์ด๋ฏธ์ง€์™€ ๋‹ฌ๋ฆฌ ์งˆ๊ฐ๊ณผ ์Œ์˜ ์ •๋ณด ๊ฐ™์€ ์ถฉ๋ถ„ํ•œ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์ง€ ์•Š๋‹ค. ์ฑ„์ƒ‰์„ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ์กฐ๊ฑด์ž…๋ ฅ์œผ๋กœ๋Š” ์ฐธ๊ณ  ์ด๋ฏธ์ง€ ๋˜๋Š” ๋ช‡ pixel์˜ ์ปฌ๋Ÿฌ ์ •๋ณด์™€ ๊ฐ™์€ ํžŒํŠธ(Weak Hint)๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ๋•Œ๋ฌธ์— ์„ ํ™”์˜ ์ž๋™์ฑ„์ƒ‰์€ ์„ ํ™”์˜ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ  ๋ถ€์กฑํ•œ ์ •๋ณด์—์„œ ์งˆ๊ฐ๊ณผ ์Œ์˜์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€ ๋ถ„ํ•  ๋ฐ ์ปฌ๋Ÿฌํ™”์— ๋Œ€ํ•œ ๋ณตํ•ฉ์ ์ธ ๋ฌธ์ œ๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ปดํ“จํ„ฐ ๋น„์ „ ์˜์—ญ์—์„œ ๋„์ „์ ์ธ ๊ณผ์ œ์ด๋‹ค.

์„ ํ™” ์ž๋™์ฑ„์ƒ‰์€ ์ž…๋ ฅํ•˜๋Š” ๋ฐ์ดํ„ฐ์— ๋”ฐ๋ผ ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. ์ฒซ์งธ, ์„ ํ™”๋งŒ ์‚ฌ์šฉํ•ด ์ฑ„์ƒ‰๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์™„์ „์ž๋™๋ฐฉ์‹(3,4), ๋‘˜์งธ, ์„ ํ™”์™€ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•ด ์„ ํ™”๋ฅผ ์ž…๋ ฅํ•œ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ ์Šคํƒ€์ผ๋กœ ์ฑ„์ƒ‰์„ ํ•˜๋Š” ์Šคํƒ€์ผ ๋ณ€ํ™˜์„ ํ†ตํ•œ ์ž๋™๋ฐฉ์‹(5-7), ์…‹์งธ, ์„ ํ™”์™€ ์‚ฌ์šฉ์ž ์ปฌ๋Ÿฌ ํžŒํŠธ๋ฅผ ์ž…๋ ฅํ•ด ์›ํ•˜๋Š” ์ƒ‰์œผ๋กœ ์ฑ„์ƒ‰ํ•˜๋Š” ๋ฐฉ์‹(8-13)์ด๋‹ค.

๊ธฐ์กด ๊ธฐ๋ฒ• ๋ชจ๋‘ GAN์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, ์ง€์›ํ•˜๋Š” ์ถœ๋ ฅ ํ•ด์ƒ๋„๊ฐ€ ์ตœ๋Œ€ 512 pixel๋กœ ์ œํ•œ๋˜์–ด ์‚ฐ์—…์— ์‚ฌ์šฉ๋˜๊ธฐ ํž˜๋“  ํ•ด์ƒ๋„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ํ•œ๊ตญ์˜ 5๊ฐœ์˜ ์ฃผ์š” ์›นํˆฐ ํ”Œ๋žซํผ์„ ๋Œ€์ƒ์œผ๋กœ ์กฐ์‚ฌํ•ด๋ณธ ๊ฒฐ๊ณผ ๊ฐ€๋กœ ํ•ด์ƒ๋„๊ฐ€ ์ตœ์†Œ 690 ์ตœ๋Œ€ 760์œผ๋กœ ์›น์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ด๋ฏธ์ง€๋ฅผ ๊ณ ๋ คํ–ˆ์„ ๋•Œ ๊ธฐ์กด ๊ธฐ๋ฒ•์œผ๋กœ๋Š” ๋ถ€์กฑํ•œ ํ•ด์ƒ๋„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ด๋Š” ์ด๋ฏธ์ง€ ์ƒ์„ฑ์— ์‚ฌ์šฉ๋˜๋Š” CNN์ด ์ž…์ถœ๋ ฅ ํ•ด์ƒ๋„์— ๋”ฐ๋ผ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰, ์—ฐ์‚ฐ์‹œ๊ฐ„์ด ๊ธ‰๊ฒฉํ•˜๊ฒŒ ๋†’์•„์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ผ๋ฐ˜์ ์ธ GPU์˜ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์ด์ฆˆ๊ฐ€ 8์—์„œ 11GB์ธ ๊ฒƒ์„ ๊ณ ๋ คํ•˜๋ฉด 1,000 pixel ์ด์ƒ์˜ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋ฅผ ํ•™์Šตํ•˜๊ณ  ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์€ ๋‹จ์ผ GPU์—์„œ ๋งŽ์€ ์ œ์•ฝ์ด ๋”ฐ๋ฅธ๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋ธ์„ ๋ถ„ํ• ํ•˜์ง€ ์•Š์€ ๋‹จ์ผ ํ•˜๋“œ์›จ์–ด์—์„œ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€ ์ถœ๋ ฅ์„ ๊ธฐ์กด ๊ธฐ๋ฒ•์—์„œ๋Š” ์ง€์›ํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ธฐ์กด ์ž๋™์ฑ„์ƒ‰ ๊ธฐ๋ฒ•๋“ค์˜ ํ•ด์ƒ๋„ ์ œํ•œ์„ ๊ฐœ์„ ํ•˜๊ณ  ๋†’์€ ์ˆ˜์ค€์˜ ์ฑ„์ƒ‰ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ธฐ ์œ„ํ•ด ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ์„ ์‚ฌ์šฉํ•œ ์ƒˆ๋กœ์šด ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค.

์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์€ 3๊ฐ€์ง€๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. 1. ์„ ํ™” ์ถ”์ถœ๋ฐฉ์‹์„ 2๊ฐ€์ง€๋กœ ์ง„ํ–‰ํ•œ ์„ ํ™” ๋ฐ์ดํ„ฐ ์ฆ์‹, 2. GAN์„ ์‚ฌ์šฉํ•œ ์ €ํ•ด์ƒ๋„ ์ดˆ์•ˆ ๋ชจ๋ธ๊ณผ ์ฑ„์ƒ‰์„ ์ง„ํ–‰ํ•˜๋Š” ์ฑ„์ƒ‰ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ ์ด์ค‘์ƒ์„ฑ์ž, 3. ๊ณ ํ•ด์ƒ๋„ ์„ ํ™” ์ด๋ฏธ์ง€ ์ฑ„์ƒ‰์„ ์œ„ํ•œ ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ๊ธฐ๋ฒ•. ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•ด ์›๋ณธ ์ด๋ฏธ์ง€์™€์˜ FID, PSNR ๊ทธ๋ฆฌ๊ณ  SSIM์„ ํ†ตํ•œ ์œ ์‚ฌ๋„ ํ‰๊ฐ€๋ฅผ ์ง„ํ–‰ํ–ˆ๋‹ค. ํ‰๊ฐ€ ๊ฒฐ๊ณผ ๊ธฐ์กด๊ธฐ๋ฒ•(11)์˜ 51.64 ๋ณด๋‹ค ๋‚ฎ์€ 47.87 FID ์ ์ˆ˜ (๋†’์€ ํ’ˆ์งˆ) ๋ฅผ ๊ธฐ๋กํ–ˆ๊ณ  PSNR ๋ฐ SSIM์€ ๊ฐ๊ฐ 13.01, 0.72 ๋ณด๋‹ค ๋†’์€ 20.77, 0.86์„ ๊ธฐ๋กํ–ˆ๋‹ค. ์‹œ๊ฐ์ ์ธ ๋น„๊ต ๋˜ํ•œ ์ƒ‰ ๋ฒˆ์ง ๋‚ฎ๊ณ  ๋ˆˆ, ๋จธ๋ฆฌ์นด๋ฝ ๋“ฑ๊ณผ ๊ฐ™์ด ๊ธฐ์กด ์„ ํ™”์˜ ์งˆ๊ฐ์„ ๋ณด์กดํ•˜๋Š” ๋“ฑ ์šฐ์ˆ˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ์˜ ์ฃผ์š”๊ธฐ์—ฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

โˆ™ํ•ด์ƒ๋„ ์ฆ๊ฐ€์— ์œ ์—ฐํ•˜๊ฒŒ ๋Œ€์ฒ˜ํ•˜๊ธฐ ์œ„ํ•ด 2๋‹จ ์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•œ ํšจ์œจ์ ์ธ ์ฑ„์ƒ‰ ๋ชจ๋ธ ๊ตฌ์กฐ

โˆ™์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ์„ ํ†ตํ•œ ์ด๋ฏธ์ง€ ํ•ฉ์„ฑ์œผ๋กœ ํ•ด์ƒ๋„์— ์ž์œ ๋กœ์šด ์ฑ„์ƒ‰ ๊ธฐ๋ฒ•

๋ณธ ๋…ผ๋ฌธ์˜ ๊ตฌ์„ฑ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 2์žฅ์€ GAN๊ณผ ๊ธฐ์กด ์„ ํ™” ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•œ ์ž๋™์ฑ„์ƒ‰ ์—ฐ๊ตฌ๋ฅผ ์ž…๋ ฅ์— ๋”ฐ๋ผ ๋ถ„๋ฅ˜ํ•˜๊ณ  ์„ค๋ช…ํ•œ๋‹ค. 3์žฅ์— ์„œ๋Š” ์ œ์•ˆํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ฆ์‹ ๊ธฐ๋ฒ•, ๋ชจ๋ธ ๊ตฌ์„ฑ, ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ๊ธฐ๋ฐ˜์˜ ํ•ฉ์„ฑ ๊ธฐ๋ฒ•, ๋ชจ๋ธํ•™์Šต ๋ฐฉ์‹์— ๊ด€ํ•ด ์„ค๋ช…ํ•œ๋‹ค. 4์žฅ์—์„œ๋Š” ํ•™์Šต์— ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ ๋ฐ ์‹œ๊ฐ์ , ์ •๋Ÿ‰์  ์‹คํ—˜๊ณผ ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ์„ค๋ช…ํ•œ๋‹ค. 5์žฅ์—์„œ๋Š” ๋…ผ๋ฌธ์˜ ๊ฒฐ๋ก ์„ ์ œ์‹œํ•œ๋‹ค.

2. ๊ด€๋ จ ์—ฐ๊ตฌ

์ด๋ฒˆ ์žฅ์—์„œ๋Š” ์„ ํ™” ์ฑ„์ƒ‰์— ์‚ฌ์šฉ๋˜๋Š” GAN ๋ฐ CNN (Convolutional Neural Network) ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•œ๋‹ค. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ˜•์‹๊ณผ ๊ด€๋ จํ•˜์—ฌ ์„ ํ™”๋ฅผ ์ฑ„์ƒ‰ํ•˜๋Š” ์„ธ ๊ฐ€์ง€ ๋ฐฉ๋ฒ• (์™„์ „ ์ž๋™ ์ฑ„์ƒ‰, ์Šคํƒ€์ผ ์ „์†ก ๋˜๋Š” ๋ฐ˜์ž๋™ ์ฑ„์ƒ‰, ์‚ฌ์šฉ์ž ํžŒํŠธ)์ด ์žˆ๋‹ค.

2.1 ์ƒ์„ฑ์  ์ ๋Œ€์  ๋„คํŠธ์›Œํฌ

์ƒ์„ฑ์  ์ ๋Œ€์  ๋„คํŠธ์›Œํฌ(Generative Adversarial Networks, GAN) Goodfellow et al. (1) ๋ชจ๋ธ์€ ์ดˆ ํ•ด์ƒ๋„ (super resolution), ์ด๋ฏธ์ง€ ์ƒ์„ฑ, TTS (Text To Speech), ๋ฐ์ดํ„ฐ ์ฆ์‹ (Data aug- mentation)๋“ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š”๋ฐ ํƒ์›”ํ•œ ์„ฑ๋Šฅ์œผ๋กœ ์ตœ๊ทผ ๋งŽ์€ ์—ฐ๊ตฌ(14-16) ์—์„œ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋‹ค.

GAN์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑ ํ•˜๋Š” ์ƒ์„ฑ์ž, ๋ฐ์ดํ„ฐ์˜ ์ง„์œ„๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ๊ตฌ๋ถ„์ž ๋‘ ๊ฐœ์˜ ๋ชจ๋ธ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์„œ๋กœ์˜ ๋ชฉ์ ์— ๋ฐ˜ํ•˜๋Š” ์ ๋Œ€์ ์ธ ํ•™์Šต Framework๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. GAN์€ ๋‘ ๊ฐœ์˜ ๋ชจ๋ธ์˜ ๊ด€๊ณ„๋ฅผ ํ†ตํ•ด ์ž…๋ ฅํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ์ƒ์„ฑ์ž๊ฐ€ ๋”ฐ๋ผ๊ฐ€๊ฒŒ ๋˜์–ด ์‚ฌ์‹ค์ ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

GAN์€ ์‚ฌ์‹ค์ ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ์— ์ ํ•ฉํ•˜์ง€๋งŒ ๋‘ ๋ชจ๋ธ ๊ฐ„์˜ ๊ท ํ˜•์ ์ธ ํ•™์Šต์ด ์–ด๋ ต๊ณ , ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ผ๊ฐ€๊ธฐ ๋•Œ๋ฌธ์— ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ํž˜๋“  ๋‹จ์ ์ด ์กด์žฌํ•œ๋‹ค. Radford et al. (17) (DCGAN)์€ ๋งŽ์€ ์‹คํ—˜์„ ํ†ตํ•ด CNN์„ ์‚ฌ์šฉ, ์ƒ์„ฑ์ž์™€ ๋ถ„๋ฅ˜์ž์˜ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ๋ณ€ํ˜•ํ•˜๊ณ  batch normalization (18)์„ ์ ์šฉํ•ด ์„ฑ๊ณต์ ์ธ ํ•™์Šต์„ ์œ„ํ•œ GAN ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ๊ฒฐ์ •ํ–ˆ๋‹ค. Mehdi et al. (19) (Conditional GAN, cGAN) ์€ GAN ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์„ ์กฐ์ ˆํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ๋กœ ์ƒ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•œ ํด๋ž˜์Šค ๋ผ๋ฒจ์„ ์ถ”๊ฐ€ํ•ด ํ•™์Šตํ•ด GAN ์ƒ์„ฑ ๋ฐ์ดํ„ฐ์˜ ์กฐ์ ˆ์„ ์ง„ํ–‰ํ–ˆ๋‹ค.

2.2 ์™„์ „์ž๋™ ์ฑ„์ƒ‰ ๊ธฐ๋ฒ•

์™„์ „์ž๋™๋ฐฉ์‹์˜ ์ฑ„์ƒ‰ ๊ธฐ๋ฒ•์€ ๋‹ค๋ฅธ ํ˜•ํƒœ์˜ ์ž…๋ ฅ ์—†์ด ์„ ํ™”๋งŒ์„ ์‚ฌ์šฉํ•œ๋‹ค(3,4). Isola et al. (3) (Pix2Pix)๋Š” ์—ฐ๊ตฌ(19)์˜ ์กฐ๊ฑด์ž…๋ ฅ์„ ์‚ฌ์šฉํ•œ cGAN ๊ตฌ์กฐ๋กœ ์ด๋ฏธ์ง€ ๋Œ€ ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์˜ ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ–ˆ๋‹ค. (3)์€ ์‚ฌ์‹ค์ ์ธ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด $L_{1}$ ์†์‹ค ๋ฐ ์ ๋Œ€์  ์†์‹ค์„ ๊ฒฐํ•ฉํ•ด $L_{1}$ ์†์‹ค๋งŒ ์‚ฌ์šฉํ•œ ๊ฒฐ๊ณผ์— ๋น„ํ•ด ์„ ๋ช…ํ•˜๊ณ  ์‚ฌ์‹ค์ ์ธ(photorealistic) ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ–ˆ๋‹ค. Kang et al. (4)์—์„œ๋Š” ์ฑ„์ƒ‰์ž‘์—…์„ ์œ„ํ•œ 3๊ฐ€์ง€ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค. ๊ฐ๊ฐ์˜ ๋ชจ๋ธ์€ ์‹ค์งˆ์ ์ธ ์ฑ„์ƒ‰์„ ๋‹ด๋‹นํ•˜๋Š” โ€Low-resolution Colorizerโ€, ์ „๊ฒฝ๊ณผ ๋ฐฐ๊ฒฝ์„ ๋ถ„๋ฅ˜ํ•˜๋Š” โ€Background Detectorโ€ ๊ทธ๋ฆฌ๊ณ  ์ฑ„์ƒ‰๋œ ์ €ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€์™€ ๋ฐฐ๊ฒฝ Segment๋ฅผ ๋ฐ›์•„ ๋ฐฐ๊ฒฝ์„ ๊ตฌ๋ถ„ํ•˜์—ฌ ํ•ด์ƒ๋„๋ฅผ ๋ณต์›ํ•˜๋Š” โ€Polishing Networkโ€๋ฅผ ์‚ฌ์šฉ ํ•œ๋‹ค. ์—ฐ๊ตฌ(4)๋Š” ๋งํ’์„ ๊ณผ ๊ฐ™์€ ๋งŒํ™”์˜ ํŠน์ง•์„ ์ž˜ ํ™œ์šฉํ•˜์˜€๊ณ  ์„ ํ™”๋ฅผ ์ผ๊ด„์ ์œผ๋กœ ์ฑ„์ƒ‰ํ•  ์ˆ˜ ์žˆ๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์™„์ „ ์ž๋™์œผ๋กœ ์ฑ„์ƒ‰์„ ์ง„ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์›ํ•˜๋Š” ๋ถ€์œ„๋ฅผ ์ฑ„์ƒ‰ํ•˜๊ธฐ์—๋Š” ํž˜๋“ค๊ณ  ์ถœ๋ ฅ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๊ฐ€ 256x256 pixel ํ•ด์ƒ๋„๋กœ ํ•œ์ •๋˜๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.

2.3 ์Šคํƒ€์ผ ๋ณ€ํ™˜ ๊ธฐ๋ฐ˜ ์ฑ„์ƒ‰ ๊ธฐ๋ฒ•

์Šคํƒ€์ผ ๋ณ€ํ™˜ ๊ธฐ๋ฐ˜์˜ ์ž๋™์ฑ„์ƒ‰ ๊ธฐ๋ฒ•(5-7)์€ ์„ ํ™”์™€ ์ฐธ๊ณ ๊ฐ€ ๋˜๋Š” ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋กœ ๊ตฌ์„ฑ๋œ ๋‘ ๊ฐœ์˜ ์‚ฌ์šฉ์ž ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. Furusawa et al. (5)๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์„ ํ™”์—์„œ ์ƒ‰์ƒ ์„ ํƒ์„ ์ œ์–ด ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งŒํ™”์˜ ์ฐธ์กฐ ์ด๋ฏธ์ง€์™€ ๋Œ€ํ™” ํ˜• ์ƒ‰์ƒ ํžŒํŠธ(์ƒ‰์ƒ ํŒ”๋ ˆํŠธ)๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ฑ„์ƒ‰ ์ •๋ณด๋ฅผ ์›์ž‘ ๋งŒํ™”์—์„œ ์ถ”์ถœํ•œ ์œค๊ณฝ ์ •๋ณด์— ํ•ฉ์„ฑํ•˜์—ฌ ๋งŒํ™” ํŽ˜์ด์ง€๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ฑ„์ƒ‰๊ณผ์ •์„ ํ†ตํ•ด ์ƒ‰ ์ •๋ณด๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์›๋ณธ ๋งŒํ™” ์ด๋ฏธ์ง€์—์„œ ์œค๊ณฝ์„ ์„ ์ถ”์ถœํ•˜์—ฌ ํ•ฉ์„ฑํ•˜๋Š” ๊ตฌ์กฐ๋กœ ํšจ์œจ์ ์ธ ์ฑ„์ƒ‰์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ํ•˜์ง€๋งŒ ์‚ฌ์šฉ์ž์˜ ์ƒ‰์ƒ ์ •๋ณด๊ฐ€ ์ง๊ด€์ ์œผ๋กœ ์›ํ•˜๋Š” ์œ„์น˜์— ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ฉฐ ์ƒ‰ ์ •๋ณด์™€ ํ…์ŠคํŠธ์™€ ๊ฐ™์€ ์œค๊ณฝ์„ ์„ ํ˜ผํ•ฉํ•˜๋Š” ๊ณผ์ •์—์„œ ์ด๋ฏธ์ง€ ์งˆ๊ฐ ์†์ƒ์ด ์‹ฌํ•˜๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. Zhang et al. (7)์—์„œ๋Š” VGG16/19 (20) ๊ตฌ์กฐ์˜ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ์Šคํƒ€์ผ ์ด๋ฏธ์ง€๋ฅผ ์ถ”๊ฐ€ํ•ด ์ฑ„์ƒ‰์„ ์ง„ํ–‰ํ–ˆ๋‹ค. ๋ชจ๋ธ ์ค‘๊ฐ„์˜ ๋‘ ๊ฐœ์˜ โ€œGuide Decoderโ€ ์‚ฌ์šฉํ•จ์œผ๋กœ ํ•™์Šต์—์„œ์˜ ๊ธฐ์šธ๊ธฐ๊ฐ€ ์‚ฌ๋ผ์ง€๋Š” ๋ฌธ์ œ(Vanishing Gradient)๋ฅผ ๋ฐฉ์ง€ํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ, VGG16/19 ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋„คํŠธ์›Œํฌ๋ชจ๋ธ ์šฉ๋Ÿ‰์ด ํฌ๊ณ  ์ฐธ๊ณ  ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•ด ์ž๋™์œผ๋กœ ์ฑ„์ƒ‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์›ํ•˜๋Š” ๋ถ€์œ„์— ์›ํ•˜๋Š” ์ƒ‰์œผ๋กœ ์ฑ„์ƒ‰ํ•˜๊ธฐ ํž˜๋“ค๋ฉฐ ์ถœ๋ ฅ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๊ฐ€ 256x256 pixel ํ•ด์ƒ๋„๋กœ ํ•œ์ •๋œ๋‹ค.

2.4 ์‚ฌ์šฉ์ž ํžŒํŠธ ์ž…๋ ฅ ๊ธฐ๋ฐ˜ ์ฑ„์ƒ‰

์„ธ ๋ฒˆ์งธ ์ž๋™์ฑ„์ƒ‰์€ ์„ ํ™” ์ด๋ฏธ์ง€์— ์‚ฌ์šฉ์ž๊ฐ€ ์ œ๊ณตํ•œ ํžŒํŠธ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์— ํŠน์ • ์ƒ‰์„ ์น ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค(8-13). ํžŒํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์—ฐ๊ตฌ ์ค‘ ๋Œ€ํ‘œ์ ์œผ๋กœ Ci et al. (11)๊ฐ€ ์žˆ๋‹ค. ์—ฐ๊ตฌ(11)์—์„œ๋Š” ๋ชจ๋ธ์˜ ์ธ๊ณต ์„ ํ™”(์›๋ณธ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์—์„œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋งŒ๋“ค์–ด๋‚ธ ์„ ํ™”)์˜ ๊ณผ์ ํ•ฉ(over๏ฌtting)์„ ๋ง‰๊ธฐ ์œ„ํ•ด LFN(Local Feature Net)์„ ์‚ฌ์šฉํ•œ๋‹ค. LFN๋Š” ์„ ํ™”์˜ ํŠน์ง•์„ ์ถ”์ถœํ•ด, ์ƒ์„ฑ์ž ๋ฐ ๋ถ„๋ฅ˜์ž์˜ ์ถ”๊ฐ€ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค. ํ•˜์ง€๋งŒ Loss ๊ณ„์‚ฐ ์‹œ VGG16 (20) ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉ ํ•˜๋ฏ€๋กœ ๋ชจ๋ธ ์šฉ๋Ÿ‰์ด ํฌ๊ณ  LFN์„ ์ถ”๋ก  ๊ณผ์ •์—์„œ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. Sangkloy et al. (8)์€ 4๊ฐ€์ง€์˜ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•ด ์„ ํ™”๋ฅผ ์ถ”์ถœํ–ˆ๋‹ค. ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์˜ ์„ ํ™” ๋ถ„ํฌ๋กœ ๋ฐ์ดํ„ฐ ์ฆ์‹์„ ์ง„ํ–‰ํ•˜์—ฌ ์„ ํ™”์˜ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•ด ์–ผ๊ตด ์ด๋ฏธ์ง€๋ฅผ ์ฑ„์ƒ‰ํ–ˆ๋‹ค. Frans et al. (10)์€ ์ฑ„์ƒ‰ ๋ฐ ์Œ์˜์„ ์ƒ์„ฑํ•˜๋Š” ์ƒ์„ฑ์ž๋ฅผ ๋ณ„๋„๋กœ ํ•™์Šตํ–ˆ๋‹ค. ์ƒ์„ฑ์ž์˜ ์—ญํ• ์„ ๋‚˜๋ˆˆ ์ด์ค‘์ƒ์„ฑ์ž ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•ด ํšจ๊ณผ์ ์ธ ์ฑ„์ƒ‰์„ ์ง„ํ–‰ํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ฒฐ๊ณผ๋ฌผ์˜ ์งˆ์ด ๋‚ฎ๊ณ  ํ•ด์ƒ๋„๊ฐ€ 512x512 pixel์ธ ๋‹จ์ ์ด ์žˆ๋‹ค. ์ปฌ๋Ÿฌ ์ ์„ ํžŒํŠธ๋กœ ์‚ฌ์šฉํ•œ ์—ฐ๊ตฌ๋กœ๋Š” Liu et al. (9)์ด ์žˆ์œผ๋ฉฐ ์ƒ์„ฑ์ž ํ•™์Šต์„ ์œ„ํ•œ Loss๋ฅผ ๋‚˜๋ˆ„์–ด ๊ฐ๊ฐ์˜ Loss ๊ณ„์ˆ˜ ํ•ญ์„ ์กฐ์ ˆํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ํ•™์Šต ๊ฒฐ๊ณผ ์ƒ‰ ๋ฒˆ์ง์„ ํšจ๊ณผ์ ์œผ๋กœ ๋ฐฉ์ง€ํ•˜๋ฉด์„œ Pix2Pix (3) ๋ชจ๋ธ๋ณด๋‹ค ์ข‹์€ ์ด๋ฏธ์ง€๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. HATI et al. (13)๋Š” ์—ฐ๊ตฌ(11)์˜ ๋ชจ๋ธ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒ์„ฑ์ž๋ฅผ ๊ตฌ์„ฑํ–ˆ๋‹ค. ์„ ํ™”๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์„ ์‚ฌ์ „ ํ•™์Šตํ•˜์—ฌ ์‹ค์ œ์„ ํ™”์™€ ์ฑ„์ƒ‰ ๋ชจ๋ธ์—์„œ ์ƒ์„ฑ๋œ ๊ฒฐ๊ณผ๋ฌผ์˜ ์„ ํ™”์˜ ์†์‹ค์„ ๊ณ ๋ คํ•˜์—ฌ ์ฑ„์ƒ‰ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋†’์˜€๋‹ค. Zhang et al. (12)์€ ์ด์ค‘์ƒ์„ฑ์ž ๊ตฌ์กฐ์—์„œ 2๋‹จ๊ณ„ ๋ชจ๋ธ์˜ ์ดˆ์•ˆ(1๋‹จ๊ณ„) ์˜์กด๋„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ƒ์„ฑ๋œ ์ดˆ์•ˆ ์ด๋ฏธ์ง€์— ์ƒ‰ ๋ฒˆ์ง ๋“ฑ๊ณผ ๊ฐ™์€ ์ธ๊ณต๋ฌผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์ ์šฉํ•ด ํ›„์ฒ˜๋ฆฌ ๋ชจ๋ธ์˜ ์ดˆ์•ˆ ์˜์กด๋„๋ฅผ ์ค„์ด๋ฉฐ ์ฑ„์ƒ‰ ์„ฑ๋Šฅ์„ ๋†’์˜€๋‹ค.

3. ์ œ์•ˆ ๊ธฐ๋ฒ•์˜ ๊ตฌ์„ฑ

์ œ์•ˆํ•˜๋Š” ์‹œ์Šคํ…œ์˜ ์ „์ฒด ๊ตฌ์กฐ๋Š” ๊ทธ๋ฆผ 1๊ณผ ๊ฐ™๋‹ค. ์ œ์•ˆํ•˜๋Š” ์‹œ์Šคํ…œ์€ ์ฑ„์ƒ‰์„ ์ง„ํ–‰ํ•˜๋Š” Model(Draft Model, Colorization Model) ๋ฐ ์ž…๋ ฅ๋œ ์„ ํ™”์™€ ์ฑ„์ƒ‰ ๊ฒฐ๊ณผ๋ฌผ์˜ ์ €์ฃผํŒŒ ์„ฑ๋ถ„์„ ์ด์šฉํ•œ ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ•  ํ•ฉ์„ฑ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

๊ทธ๋ฆผ. 1. ์‹œ์Šคํ…œ ๊ตฌ์กฐ ๋‹ค์ด์–ด๊ทธ๋žจ

Fig. 1. System Architecture Diagram

../../Resources/kiee/KIEEP.2020.69.4.275/fig1.png

Algorithm 1 Dilate abs sub

../../Resources/kiee/KIEEP.2020.69.4.275/algo1.png

๊ทธ๋ฆผ. 2.์ „์ฒ˜๋ฆฌ ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์Œ(hint์—์„œ ํšŒ์ƒ‰์€ alpha ๊ฐ’์ด 0)

Fig. 2. Preprocessing Input Image Pair (In hint, gray has an alpha value of 0)

../../Resources/kiee/KIEEP.2020.69.4.275/fig2.png

3.1 ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

๋ชจ๋ธ ํ•™์Šต์€ ์„ ํ™”, ์ปฌ๋Ÿฌ ์Œ์œผ๋กœ ๊ตฌ์„ฑ๋œ ์ด๋ฏธ์ง€๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ปฌ๋Ÿฌ ์ผ๋Ÿฌ์ŠคํŠธ์—์„œ Extended Difference of Gaussians (21) (XDoG) ๋ฐ Dilate abs sub (Algorithm 1) ๋‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ž„์˜๋กœ ์‚ฌ์šฉํ•ด ์„ ํ™”๋ฅผ ์ถ”์ถœํ–ˆ๋‹ค. ํ•™์Šต์— ์‚ฌ์šฉ๋œ ์ด๋ฏธ์ง€ ํŽ˜์–ด๋Š” ๊ฐ ์ด๋ฏธ์ง€ ์Œ์„ 512x512 ์ž˜๋ผ๋‚ด์–ด ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ ์ดˆ์•ˆ ๋ชจ๋ธ (Draft Model)์˜ ๊ฒฝ์šฐ 256x256 ์‚ฌ์ด์ฆˆ๋กœ ํฌ๊ธฐ๋ฅผ ์กฐ์ ˆํ•ด ์‚ฌ์šฉํ–ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ฆ์‹(data augmentation)์€ ์„ ํ™” ์ถ”์ถœ๊ณผ์ •์—์„œ ๋‘๊ป˜์— ๋Œ€ํ•œ ๋ณ€ํ™”๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด XDoG์˜ ๊ฒฝ์šฐ ฮฑ ๊ฐ’์„ 0.3, 0.4, 0.5๋กœ Dilate abs sub์˜ ์ปค๋„ ์‚ฌ์ด์ฆˆ๋ฅผ 4x4,5x5 ๋กœ ์ž„์˜ ์กฐ์ •ํ•ด ๋‘๊ป˜์— ๋Œ€ํ•œ ๋‹ค์–‘ํ•œ ์กฐ๊ฑด์„ ์ƒ์„ฑํ–ˆ๋‹ค.

์›ํ•˜๋Š” ์ƒ‰์œผ๋กœ์˜ ์ฑ„์ƒ‰์„ ํ•˜๊ธฐ ์œ„ํ•ด ์ƒ์„ฑ์ž์˜ ์กฐ๊ฑด์ž…๋ ฅ์œผ๋กœ ์ปฌ๋Ÿฌ ํžŒํŠธ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. ์ปฌ๋Ÿฌ ํžŒํŠธ๋Š” ์ด์ง„ ๋งˆ์Šคํฌ๋ฅผ ์ƒ์„ฑํ•ด ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์—์„œ ํ”ฝ์…€์„ ์œ ์ถœํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ƒ์„ฑํ–ˆ๋‹ค. ๊ทธ ํ›„ ์ด์ง„ ๋งˆ์Šคํฌ๋Š” ํžŒํŠธ์˜ alpha ์ฑ„๋„๋กœ ์ถ”๊ฐ€ํ•˜์—ฌ ์œ ์ถœ๋œ ์˜์—ญ์€ 0, ์œ ์ถœ๋˜์ง€ ์•Š์€ ์˜์—ญ์€ 1 ๋กœ ์‚ฌ์šฉํ•œ๋‹ค. Alpha ์ฑ„๋„์„ ์ œ์™ธํ•œ ๋‹ค๋ฅธ ์ฑ„๋„์€ -1์—์„œ 1๋กœ ์ •๊ทœํ™”ํ•ด ํ•™์Šต์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ๊ทธ๋ฆผ 2์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ ์„ ํ™” ์ถ”์ถœ๊ธฐ๋ฒ•(XDoG, Dilate abs sub)์€ ์‚ฌ์‹ค์ ์ธ ์Šค์ผ€์น˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

3.2 ์ดˆ์•ˆ ์ฑ„์ƒ‰ ๋‹จ๊ณ„

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ์ƒ์„ฑ์ž์ธ ์ฑ„์ƒ‰ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋Š” ์ดˆ์•ˆ(Draft)์„ ๋งŒ๋“ค๊ณ  ์ดˆ์•ˆ์„ ์‚ฌ์šฉํ•ด ์ฑ„์ƒ‰(colorized)ํ•ด ์„ ํ™”์™€ ํ•ฉ์„ฑ

๊ทธ๋ฆผ. 3. ์ดˆ์•ˆ ์ฑ„์ƒ‰ ๋ชจ๋ธ ๊ตฌ์กฐ (c: ์ถœ๋ ฅ ํ•„ํ„ฐ ๊ฐœ์ˆ˜, u: ์ถœ๋ ฅ ์œ ๋‹› ๊ฐœ์ˆ˜, k: ์ปค๋„ ํฌ๊ธฐ, s: ๋ณดํญ) ์˜ˆ๋ฅผ ๋“ค์–ด c32k3s1์€ convolution ์ธต์˜ ์ถœ๋ ฅ ํ•„ํ„ฐ๊ฐ€ 32๊ฐœ, ์ปค๋„ ์‚ฌ์ด์ฆˆ๋Š” 3, Stride๋Š” 1์ด๋‹ค.

Fig. 3. Draft Model Architecture(c:Output filter num, u:Output unit num, k:Kernel size, s:Stride) For example, c32k3s1means inter convolution layer Output Filter Number is 32, Kernel Size is 3, and Stride is 1

../../Resources/kiee/KIEEP.2020.69.4.275/fig3.png

๊ทธ๋ฆผ. 4. ์ฑ„์ƒ‰ ๋ชจ๋ธ ๊ตฌ์กฐ (c: ์ถœ๋ ฅ ํ•„ํ„ฐ ๊ฐœ์ˆ˜, u: ์ถœ๋ ฅ ์œ ๋‹› ๊ฐœ์ˆ˜, k: ์ปค๋„ ํฌ๊ธฐ, s: ๋ณดํญ)

Fig. 4. Colorization Model Architecture(c:Output ๏ฌlter num, u:Output unit num, k:Kernel size, s:Stride)

../../Resources/kiee/KIEEP.2020.69.4.275/fig4.png

ํ•˜๋Š” ์„ธ ๊ณผ์ •์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ดˆ์•ˆ ๋ชจ๋ธ(Draft Model, ๊ทธ๋ฆผ 3)์€ ์ž…๋ ฅ๋ฐ›์€ ์„ ํ™”์™€ ์‚ฌ์šฉ์ž ํžŒํŠธ๋ฅผ ์‚ฌ์šฉํ•ด ์ € ํ•ด์ƒ๋„(256x256) ์ปฌ๋Ÿฌ ์ดˆ์•ˆ์„ ๋งŒ๋“ ๋‹ค.์ดˆ์•ˆ ๋ชจ๋ธ์€ ๊ณ ํ’ˆ์งˆ์˜ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•  ํ•„์š”๊ฐ€ ์—†์ง€๋งŒ, ์ฑ„์ƒ‰ ๋ชจ๋ธ์—์„œ ์ฐธ๊ณ ํ•  ํ’๋ถ€ํ•œ ์ƒ‰์„ ์˜ˆ์ธกํ•œ๋‹ค. ์ดˆ์•ˆ ๋ชจ๋ธ์€ ๊ธฐ์กด ๋‹ค์–‘ํ•œ ์„ ํ™” ์ฑ„์ƒ‰์—ฐ ๊ตฌ์— ํ™œ์šฉ๋œ ์ด๋ฏธ์ง€ ๋Œ€ ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์— ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” U-Net (22) ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

Up-Sampling์„ ๊ตฌ์„ฑํ•˜๋Š” ๋‚ด๋ถ€ ๋ชจ๋ธ์€ transpose convolution์˜ ์ฒด์ปค๋ณด๋“œ ์ธ๊ณต๋ฌผ(checkerboard artifacts)(23)์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Shi et al. (24)์˜ sub pixel convolutional (pixel shuf๏ฌ‚e ํ˜น์€ depth to space)์„ ์‚ฌ์šฉํ•ด ํ•ด์ƒ๋„๋ฅผ ๋†’์˜€๋‹ค. Up sampling๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๋‚ด๋ถ€ ๋ชจ๋ธ์€ ResNeXt block (25)์„ ์‚ฌ์šฉํ•ด ๊ณ„์‚ฐ ๋ณต์žก๋„๋ฅผ ํฌ๊ฒŒ ๋Š˜๋ฆฌ์ง€ ์•Š์œผ๋ฉด์„œ ๋„คํŠธ์›Œํฌ ์šฉ๋Ÿ‰์„ ๋Š˜๋ ธ๋‹ค. ๋ชจ๋ธ์— ์‚ฌ์šฉ๋œ ResNeXt block์€ ๊ฐ Upsampling ์ธต๋‹น 10๊ฐœ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. ์ƒ์„ฑ์ž ๋ชจ๋ธ์—์„œ(11,25,26)์˜ ์—ฐ๊ตฌ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ์ฑ„์ƒ‰์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ด๊ณ  ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋ฒ”์œ„ ์œ ์—ฐ์„ฑ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด normalization layer (18,27)๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ tanh ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋งˆ์ง€๋ง‰ ์ธต์„ ์ œ์™ธํ•œ ๋ชจ๋“  ๋‚ด๋ถ€ ๋ชจ๋ธ์€ ๊ธฐ์šธ๊ธฐ๊ฐ€ 0.2์ธ Leaky ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

์ดˆ์•ˆ ๋ชจ๋ธ์—์„œ ํ’๋ถ€ํ•œ ์ƒ‰์„ ์˜ˆ์ธกํ•˜๋Š” ๋ชฉํ‘œ๋ฅผ ์œ„ํ•ด GAN (1) ๊ตฌ์กฐ๋กœ ์ ๋Œ€์  ํ•™์Šต ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ดˆ์•ˆ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋œ ๋ถ„๋ฅ˜๊ธฐ๋Š” ์—ฐ๊ตฌ(17)์˜ ๋ถ„๋ฅ˜๊ธฐ์™€ ์œ ์‚ฌํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์—ฐ๊ตฌ(17)์—์„œ ์ œ์•ˆ๋œ strided convolution๋ฅผ ์‚ฌ์šฉํ•ด stride 2, kernel size 4์˜ CNN์œผ๋กœ ์ฐจ์›์„ ์ค„์ด๊ณ  fully connected layer๋ฅผ ์‚ฌ์šฉํ•ด 1๊ฐœ์˜ ํ™•๋ฅ  ๋ฒกํ„ฐ๋กœ ์ถœ๋ ฅ(๊ทธ๋ฆผ 3์˜ Discriminator์˜ u1)ํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„๋‹ค. ์ดˆ์•ˆ ๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜๊ธฐ๋Š” ๋งˆ์ง€๋ง‰ ์ธต์„ ์ œ์™ธํ•œ ๋ชจ๋“  ๋‚ด๋ถ€ ๋ชจ๋ธ(DownSampling, UpSampling)์—์„œ ๊ธฐ์šธ๊ธฐ๊ฐ€ 0.2์ธ Leaky ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆํ•˜๋Š” ์ดˆ์•ˆ ๋ชจ๋ธ๊ณผ ๋ถ„๋ฅ˜๊ธฐ์˜ ๊ตฌ์กฐ๋Š” ๊ทธ๋ฆผ 3๊ณผ ๊ฐ™๋‹ค.

3.3 ์ฑ„์ƒ‰ ๋‹จ๊ณ„

์ฑ„์ƒ‰ ๋‹จ๊ณ„์—์„œ๋Š” ์ดˆ์•ˆ ๋ชจ๋ธ์—์„œ ๋งŒ๋“ค์–ด์ง„ ์ดˆ์•ˆ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉ, ์„ ํ™”๋ฅผ ์ฑ„์ƒ‰ํ•˜๋Š” ์ž‘์—…์„ ์ง„ํ–‰ํ•œ๋‹ค. ์ดˆ์•ˆ์˜ ๊ฒฝ์šฐ ์ƒ‰์ƒ์˜ ์˜ค๋ฅ˜์™€ ๋ถˆํ•„์š” ํ•œ ์ธ๊ณต๋ฌผ(artifacts) ๋“ฑ์ด ํฌํ•จ๋  ์ˆ˜ ์žˆ๋‹ค. ์ฑ„์ƒ‰๋‹จ๊ณ„์—์„œ ์ดˆ์•ˆ ์ด๋ฏธ์ง€์˜ ์˜์กด์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์—ฐ๊ตฌ(12)์— ๋”ฐ๋ผ ์ปฌ๋Ÿฌ ์Šคํ”„๋ ˆ์ด, ์ƒ‰ ๋ฒˆ์ง, ์™œ๊ณก ๋“ฑ๊ณผ ๊ฐ™์€ ์ธ๊ณต๋ฌผ์„ ํ•ฉ์„ฑํ•˜๋Š” ๋‹จ๊ณ„(Artifact Stimulation)๋ฅผ ์ถ”๊ฐ€ํ–ˆ๋‹ค. ์ธ๊ณต๋ฌผ์ด ํ•ฉ์„ฑ๋œ ์ดˆ์•ˆ ์ด๋ฏธ์ง€๋Š” ๊ณ ํ•ด์ƒ๋„(512x512)๋กœ ํฌ๊ธฐ๋ณ€ํ™˜์„ ๊ฑฐ์ฒ˜ ์ฑ„์ƒ‰ ๋ชจ๋ธ์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค. ์ฑ„์ƒ‰ ๋ชจ๋ธ์€ ์ดˆ์•ˆ ๋ชจ๋ธ๊ณผ ๋™์ผํ•˜๊ฒŒ U-Net (22) ๊ตฌ์กฐ๋กœ ๋˜์–ด ์žˆ์œผ๋ฉฐ ๋งˆ์ง€๋ง‰ ์ธต์€ than ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. ์ฑ„์ƒ‰ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋Š” ๊ทธ๋ฆผ 4๋ฅผ ํ†ตํ•ด ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฑ„์ƒ‰ ๋‹จ๊ณ„์—์„œ๋Š” ์ดˆ์•ˆ๊ณผ ๋‹ฌ๋ฆฌ ํ’๋ถ€ํ•œ ์ƒ‰์„ ๋งŒ๋“ค์–ด๋‚ผ ํ•„์š”๊ฐ€ ์—†๊ณ  ์ด๋ฏธ ์ดˆ์•ˆ์—์„œ ์ฑ„์ƒ‰์„ ์œ„ํ•œ ์ถฉ๋ถ„ํ•œ ์ •๋ณด๊ฐ€ ์ƒ์„ฑ๋˜๊ธฐ ๋•Œ๋ฌธ์— GAN์€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜๋‹ค.

3.4 ์ด๋ฏธ์ง€ ํ•ฉ์„ฑ ๋ชจ๋“ˆ

ํ•ฉ์„ฑ ๋ชจ๋“ˆ์€ ์ž…๋ ฅ๋ฐ›์€ ์›๋ณธ ์„ ํ™”์™€ ์ฑ„์ƒ‰ ๊ฒฐ๊ณผ๋ฌผ์„ ์‚ฌ์šฉํ•ด ํ•ด์ƒ๋„๋ฅผ ๋Š˜๋ฆฌ๋Š” ์ž‘์—…์„ ์ง„ํ–‰ํ•œ๋‹ค. ๊ฒฐ๊ณผ๋ฌผ์˜ ํ•ด์ƒ๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ์ด๋ผ๋Š” ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ๊ณ ํ•ด์ƒ๋„์— ํ•„์š”ํ•œ ๊ณ ์ฃผํŒŒ ์„ฑ๋ถ„์˜ ๊ฒฝ์šฐ ์ž…๋ ฅ๋œ ์›๋ณธ ์„ ํ™”๋ฅผ ๋ณ€ํ™˜ํ•ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ €์ฃผํŒŒ ์„ฑ๋ถ„์€ ์ฑ„์ƒ‰ ๋‹จ๊ณ„์—์„œ ์ƒ์„ฑ๋œ 512x512 ํ•ด์ƒ๋„์˜ ์ฑ„์ƒ‰ ์ด๋ฏธ์ง€๋ฅผ ํ™œ์šฉํ•ด ์ƒ์„ฑํ•œ๋‹ค. ๋จผ์ € ์ž…๋ ฅ๋ฐ›์€ ์›๋ณธ ์„ ํ™” ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•ด 50% ํšŒ์ƒ‰(๋ฐ๊ธฐ ๊ฐ’ 127) ์ด์ƒ์˜ ๋ฐ๊ธฐ๋ฅผ 50% ํšŒ์ƒ‰์œผ๋กœ ๋ณ€ํ™˜ํ•ด ๊ณ ์ฃผํŒŒ ์„ฑ๋ถ„์„ ์ƒ์„ฑํ•œ๋‹ค. ์ดํ›„ ์ฑ„์ƒ‰ ๊ฒฐ๊ณผ๋ฌผ์˜ ํ•ด์ƒ๋„๋ฅผ ์›๋ณธ ์„ ํ™” ์ด๋ฏธ์ง€์™€ ๋™์ผํ•˜๊ฒŒ ํฌ๊ธฐ ์กฐ์ • ํ•œ๋‹ค. ํฌ๊ธฐ ์กฐ์ •๋œ ์ฑ„์ƒ‰ ๊ฒฐ๊ณผ๋ฌผ์— ๊ฐ€์šฐ์‹œ์•ˆ ํ•„ํ„ฐ๋ฅผ ์ ์šฉํ•ด ์ €์ฃผํŒŒ ์„ฑ๋ถ„์„ ์ƒ์„ฑํ•œ๋‹ค. ์ƒ์„ฑ๋œ ๊ณ /์ €์ฃผํŒŒ ์„ฑ๋ถ„์„ Linear light ํ˜ผํ•ฉ ๋ชจ๋“œ๋กœ ํ˜ผํ•ฉํ•˜์—ฌ ์ฑ„์ƒ‰ ๊ฒฐ๊ณผ๋ฌผ์„ ์ƒ์„ฑํ•œ๋‹ค. Linear light๋Š” ํ˜ผํ•ฉ ์ƒ‰์ƒ์— ๋”ฐ๋ผ ๋ฐ๊ธฐ๋ฅผ ์ค„์ด๊ฑฐ๋‚˜ ๋†’์—ฌ ์ƒ‰์ƒ์„ ๋ฒ„๋‹(์–ด๋‘ก๊ฒŒ), ๋‹ท์ง€(๋ฐ๊ฒŒ)ํ•œ๋‹ค. ๊ด‘์› ์ด 50% ํšŒ์ƒ‰(๋ฐ๊ธฐ ๊ฐ’ 127)๋ณด๋‹ค ๋ฐ์€ ๊ฒฝ์šฐ ๋ฐ๊ธฐ๋ฅผ ๋†’์—ฌ ์ด๋ฏธ์ง€๊ฐ€ ๋ฐ์•„์ง€๋ฉฐ ์–ด๋‘์šฐ๋ฉด ๋ฐ๊ธฐ๋ฅผ ์ค„์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์–ด๋‘ก๊ฒŒ ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ์„ ํ™” ์˜์—ญ์˜ ๋ฐ๊ธฐ๊ฐ€ 0์— ๊ฐ€๊นŒ์šด ๊ฐ’์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด ์ฑ„์ƒ‰ ์ด๋ฏธ์ง€์— ์„ ํ™”๋ฅผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ•ฉ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ์ œ์•ˆํ•œ ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ๊ธฐ๋ฒ•์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ Algorithm 2์™€ ๊ฐ™๋‹ค. ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐ ๋‹จ๊ณ„ ์ด๋ฏธ์ง€๋Š” ๊ทธ๋ฆผ 5์™€ ๊ฐ™๋‹ค.

Algorithm 2 Frequency separation

../../Resources/kiee/KIEEP.2020.69.4.275/algo2.png

3.5 ์†์‹ค ํ•จ์ˆ˜

๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” ๊ฐ ์†์‹คํ•จ์ˆ˜์™€ ๊ตฌ์„ฑ์š”์†Œ์— ๊ด€ํ•ด ์„ค๋ช…ํ•œ๋‹ค. ๋จผ์ € ์ดˆ์•ˆ ๋‹จ๊ณ„์˜ ์ƒ์„ฑ์ž ์†์‹คํ•จ์ˆ˜ ์‹(1)์€ ์ ๋Œ€์ ($L_{GAN}$), ์žฌ๊ตฌ์„ฑ($L_{recon}$), ์ฝ˜ํ…์ธ ($L_{cont}$)์˜ ์„ธ ๊ฐ€์ง€ ํ•ญ์˜ ์กฐํ•ฉ์œผ๋กœ ์ •์˜ํ•œ๋‹ค. ๊ฐ ํ•ญ์˜ ์˜ํ–ฅ๋ ฅ์€ ๊ณ„์ˆ˜($w_{a}, w_{r}, w_{c}$)๋ฅผ ์‚ฌ์šฉํ•ด ์กฐ์ •ํ•œ๋‹ค. ์‹(1),(2),(3),(4),(5)์—์„œ l์€ ์ถ”์ถœ๋œ ์„ ํ™”(256x256), h์€ ์ปฌ๋Ÿฌ ํžŒํŠธ(256x256), c์€ ์›๋ณธ ์ปฌ๋Ÿฌ์ด๋ฏธ์ง€ (256x256), D, G๋Š” ๊ฐ๊ฐ ๋ถ„๋ฅ˜์ž ๋ฐ ์ƒ์„ฑ์ž(์ดˆ์•ˆ ๋ชจ๋ธ)์ด๋‹ค.

(1)
$$ \begin{array}{r} \mathcal{L}_{d r a f t}=w_{a} \min _{G} \max _{D} \mathcal{L}_{G A N}(G, D)+ \\ w_{r} \mathcal{L}_{\text {recon}}(G)+ \\ \left.w_{c} \mathcal{L}_{\text {recon}}(G, \mathcal{F})\right) \end{array} $$

(2)
$$ \begin{aligned} \mathcal{L}_{G A N}(G, D)=& E_{c}[\log (D(c))]+\\ & E_{l, h}[\log (1-D(G(l, h)))] \end{aligned} $$

์ ๋Œ€์  ์†์‹ค($L_{GAN}$)์€ ์‹(2)(1,16)์˜ ์ž‘์—…์„ ๋”ฐ๋ฅธ๋‹ค. ํŒ๋ณ„ ์ž D()๋Š” ์‹ค์ œ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ c์˜ ํ™•๋ฅ ์„ ์ถ”์ •ํ•œ๋‹ค. D()์˜ ๊ฒฐ๊ณผ๋Š” ๋งˆ์ง€๋ง‰ ์ธต์— ์‚ฌ์šฉ๋œ sigmoid ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ์˜ํ•ด 0๊ณผ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค. $E_{c}$ ๋ฐ $E_{l,h}$๋Š” ๊ฐ๊ฐ log(D(c)) ๋ฐ log(1โˆ’D(G(l, h)))์˜ ์˜ˆ์ƒ ๊ฐ’์„ ์ธก์ •ํ•œ๋‹ค. G(l,h)๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์ œ๊ณต ํ•œ ํžŒํŠธ(h)์™€ ํ•จ๊ป˜ ์ถ”์ถœ ๋œ ์„ ํ™”(l)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. G(l,h)๋Š” ์›๋ณธ ์ด๋ฏธ์ง€์™€ ์œ ์‚ฌํ•œ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง„ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋ ค ํ•œ๋‹ค. D(G(l, h))๋Š” ์ƒ์„ฑ ๋œ ๋ฐ์ดํ„ฐ(G(l,h))์˜ ํ™•๋ฅ ์„ ์ถ”์ •ํ•œ๋‹ค.

๊ทธ๋ฆผ. 5. ํ•ฉ์„ฑ๋œ ๊ฒฐ๊ณผ๋ฌผ

Fig. 5. Blending Result

../../Resources/kiee/KIEEP.2020.69.4.275/fig5.png

์ด๋ก ์ ์œผ๋กœ๋Š” D()์™€ G()๋Š” ํ’ˆ์งˆ์„ ํ–ฅ์ƒํ•˜๊ธฐ ์œ„ํ•ด ์ƒํ˜ธ ์ž‘์šฉํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ๋กœ๋Š” GAN ๊ตฌ์กฐ์—์„œ ๋‘ ๋ชจ๋ธ ๊ฐ„์˜ ๊ท ํ˜•์„ ์ด๋ฃจ๊ธฐ๊ฐ€ ์‰ฝ์ง€ ์•Š์œผ๋ฉฐ, ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค๋ฅธ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ถ”๊ฐ€๋กœ ์‚ฌ์šฉํ•œ๋‹ค. ์—ฐ๊ตฌ์—์„œ๋Š” ํ•™์Šต ๊ณผ์ •์„ ์•ˆ์ •ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์‹(3),(4)๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค.

(3)
$$ \mathcal{L}_{\text {recon}}(G)=E_{l, h, c}\left[\|c-G(l, h)\|_{1}\right] $$

์žฌ๊ตฌ์„ฑ ์†์‹ค($L_{recon}$)์€ ์‹(3)์— ์ •์˜๋˜์–ด ์žˆ๋‹ค. ์žฌ๊ตฌ์„ฑ ์†์‹ค์€ ์‹ค์ œ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ c์™€ ์ƒ์„ฑ ์ด๋ฏธ์ง€ G(l,h)์˜ $L_{1}$ ์†์‹ค์„ ์ธก์ •ํ•œ๋‹ค. G()๋Š” ์›๋ž˜ ์ƒ‰์ƒ ์ด๋ฏธ์ง€ c์˜ ์ƒ‰์ƒ ๋ถ„ํฌ์™€ ์ผ์น˜ํ•˜๋„๋ก ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€์˜ ์ƒ‰์ƒ ๊ณต๊ฐ„์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ G()๋Š” D()๋ฅผ ์†์ผ ์ˆ˜ ์žˆ๋Š” ๊ณ ํ’ˆ์งˆ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

(4)
$$ \mathcal{L}_{\text {cont}}(G, \mathcal{F})=\operatorname{MSE}(\mathcal{F}(c)-\mathcal{F}(G(l, h))) $$

์ฝ˜ํ…์ธ  ์†์‹ค($L_{cont}$)์€ ์‹(4)์— ์ •์˜๋˜์–ด ์žˆ๋‹ค. ์ฝ˜ํ…์ธ  ์†์‹ค์€ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€ G(l,h)์™€ ์‹ค์ œ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ c๋ฅผ ์‚ฌ์šฉํ•œ F ํŠน์ง• ๋งต์˜ $L_{2}$ ์†์‹ค(ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ, MSE)๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. F๋Š” ImageNet (28) ๋ฐ์ดํ„ฐ๋กœ ํ›ˆ๋ จ๋œ VGG16 (20) ๋ชจ๋ธ์˜ ๋„ค ๋ฒˆ์งธ convolution layer์—์„œ ๋งŒ๋“ค์–ด์ง€๋Š” ํŠน์ง• ๋งต์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ฝ˜ํ…์ธ  ์†์‹ค์€ ํ”ฝ์…€ ๊ณต๊ฐ„์—์„œ ํ‘œํ˜„ํ•  ์ˆ˜ ์—†๋Š” ์ด๋ฏธ์ง€์˜ ํŠน์ง•์„ ํ”ผ์ฒ˜ ๋งต์„ ํ†ตํ•œ ์ง€๊ฐ์  ์œ ์‚ฌ์„ฑ์„ ํฌ์ฐฉํ•˜์—ฌ ์†์‹ค์„ ์ธก์ •ํ•œ๋‹ค.

(5)
$$ \begin{array}{l} \mathcal{L}_{\text {color}}\left(G^{\prime}, G\right)= \\ E_{l, l^{\prime}, h, c^{\prime}}\left[\| c^{\prime}-G^{\prime}\left(l^{\prime}, \text {resize}(G(l, h))\right) \|_{1}\right] \end{array} $$

์ฑ„์ƒ‰ ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” $L_{color}$ ๋Š” G์˜ ํ•™์Šต์ด ๋๋‚œ ๋’ค G(l, h) (์ดˆ์•ˆ ์ด๋ฏธ์ง€)๋ฅผ 512x512 ํ•ด์ƒ๋„๋กœ ํฌ๊ธฐ ์กฐ์ •ํ•˜์—ฌ ์‹ค์ œ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ ๊ฐ„์˜ L1 ์†์‹ค์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ๋‹จ๊ณ„์—์„œ๋Š” ์ ๋Œ€์  ์†์‹ค์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์œผ๋ฉฐ, G(l,h)์œผ๋กœ ์ƒ์„ฑ๋œ ํ’๋ถ€ํ•œ ์ƒ‰์„ ์ •๊ตํ•˜๊ฒŒ ์žฌ์ƒ์„ฑํ•˜๋Š” ์ž‘์—…์„ ์ง„ํ–‰ํ•œ๋‹ค. ์‹(5)์—์„œ Gโ€ฒ๋Š” ์ฑ„์ƒ‰ ๋ชจ๋ธ, l์€ ์„ ํ™”(256x256), lโ€ฒ์€ ์„ ํ™”(512x512), h๋Š” ํžŒํŠธ, cโ€ฒโ€ฒ๋Š” ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€(512x512)๋ฅผ ํ‘œํ˜„ํ•œ๋‹ค.

4. ์‹คํ—˜๊ณผ ๋ถ„์„

4.1 ๋ฐ์ดํ„ฐ ์…‹

์• ๋‹ˆ๋ฉ”์ด์…˜ ์Šคํƒ€์ผ ์ผ๋Ÿฌ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์…‹์œผ๋กœ ์•Œ๋ ค์ง„ Danbooru (29) ๋ฐ์ดํ„ฐ ์…‹์€ ๋น„์œจ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ํŒจ๋”ฉ, ์ž‘์€ ํ•ด์ƒ๋„ ๋“ฑ ์ž‘์—…์— ๋ฐฉํ•ด๋˜๋Š” ์š”์†Œ๊ฐ€ ๋งŽ์ด ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํ•™์Šต์„ ์œ„ํ•ด ๋Œ€๊ทœ๋ชจ ์ผ๋Ÿฌ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์…‹์„ ์ง์ ‘ ์ˆ˜์ง‘ํ–ˆ๋‹ค. ๋ฐ์ดํ„ฐ๋Š” shuushuu-imageboard (30)๋ฅผ ํ†ตํ•ด ์ˆ˜์ง‘ํ•˜์˜€์œผ๋ฉฐ ์ˆ˜์ง‘ ํ›„ ํ•™์Šต์— ์•…์˜ํ–ฅ์„ ๋ผ์น  ์ˆ˜ ์žˆ๋Š” ๋ถ€์ ์ ˆํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ํ•„ํ„ฐ๋งํ•˜์—ฌ ์•ฝ 70๋งŒ ์žฅ์˜ ์ปฌ๋Ÿฌ ์• ๋‹ˆ๋ฉ”์ด์…˜ ์ผ๋Ÿฌ์ŠคํŠธ์™€ 500๊ฐœ์˜ ์‹ค์ œ ์„ ํ™”-์ปฌ๋Ÿฌ ์ผ๋Ÿฌ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์Œ์„ ์ˆ˜์ง‘ํ–ˆ๋‹ค. ํ•„ํ„ฐ๋ง ๋œ ๋ถ€์ ์ ˆํ•œ ๋ฐ์ดํ„ฐ๋Š” ํ‘๋ฐฑ, ํ•˜์ด/๋กœ์šฐ ํ‚ค ์ด๋ฏธ์ง€, ๊ทธ๋ฆผ์˜ ์ž‘์€ ๋ณ€์ด 512 pixel ์ดํ•˜์˜ ์ž‘์€ ์ด๋ฏธ์ง€, ์ „๋ฐ˜์ ์ธ ํ†ค ํ˜น์€ ์ƒ‰์ด ํ•œ์ชฝ์œผ๋กœ ํŽธํ–ฅ๋œ ์ด๋ฏธ์ง€, ์ผ๋Ÿฌ์ŠคํŠธ๊ฐ€ ์•„๋‹Œ ๋‚™์„œ ๊ทธ๋ฆฌ๊ณ  ํ˜„์‹ค์˜ ์‚ฌ๋ฌผ์ด ํ˜ผํ•ฉ๋œ ์ด๋ฏธ์ง€์ด๋‹ค.

4.2 ์‹คํ—˜ ํ™˜๊ฒฝ

์šฐ๋ฆฌ๋Š” ์ œ์•ˆํ•œ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ ๊ตฌํ˜„์„ ์œ„ํ•ด PyTorch framework (31)๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. ๋ชจ๋ธ ํ•™์Šต์€ ํ•œ ์žฅ์˜ NVIDIA RTX 2080Ti๋ฅผ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ ํ•™์Šต์„ ์œ„ํ•ด ์‚ฌ์šฉํ•œ Hyperparameter๋Š” ์ดˆ์•ˆ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ์ตœ์ ํ™” ํ•จ์ˆ˜๋กœ Adam (32)์„ ์‚ฌ์šฉํ–ˆ๊ณ  ฮฒ1: 0.5, ฮฒ2: 0.9, learning rate:0.0001์„ ์‚ฌ์šฉํ–ˆ๋‹ค. Learning rate์˜ ๊ฒฝ์šฐ 112k ์ง€์ ์—์„œ 0.1๋ฐฐ ๊ฐ์†Œ ์‹œ์ผœ ํ•™์Šตํ•˜์˜€๊ณ  ํ•™์Šต์— ์‚ฌ์šฉ๋œ batch size๋Š” 64๋กœ ์ด 280K step์„ ์ง„ํ–‰ํ–ˆ๋‹ค. ์ดˆ์•ˆ ๋ชจ๋ธ ์†์‹ค์˜ ๊ฐ ๊ฐ€์ค‘์น˜๋Š” ๊ฐ๊ฐ w a : 0.05, w r : 1.0, w c : 0.1์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ์ฑ„์ƒ‰ ๋ชจ๋ธ ๋˜ํ•œ ์ตœ์ ํ™” ํ•จ์ˆ˜๋กœ Adam์„ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ Hyper- parameter๋Š” ์ดˆ์•ˆ ๋ชจ๋ธ๊ณผ ๊ฐ™๋‹ค.

์‹คํ—˜์„ ์œ„ํ•ด ์ œ์•ˆํ•œ ์‹ ๊ฒฝ๋ง์„ ๊ทธ๋ฆผ 6๊ณผ ๊ฐ™์ด Open Source ์ด๋ฏธ์ง€ ํŽธ์ง‘ ๋„๊ตฌ์ธ GIMP์˜ Plugin์œผ๋กœ ๊ฐœ๋ฐœํ–ˆ๋‹ค. GIMP๋Š” Photoshop๊ณผ ์œ ์‚ฌํ•œ ๋ ˆ์ด์–ด ์‹œ์Šคํ…œ๊ณผ ์ด๋ฏธ์ง€ ํ˜ผํ•ฉ์„ ์ง€์›ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ œ์•ˆํ•œ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•์„ ์†์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ํ›Œ๋ฅญํ•œ Front-End๋กœ ๊ณ ์ˆ˜์ค€์˜ ์‚ฌ์šฉ์ž ๊ฒฝํ—˜์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋‹ค. GIMP Plugin์€ PyTorch๋กœ ์‚ฌ์ „ ํ•™์Šต๋œ ์ œ์•ˆ ๋ชจ๋ธ์„ ONNX (Open Neural Network Exchange) (33)๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์‚ฌ์šฉํ–ˆ๋‹ค. ์‹คํ—˜์— ์‚ฌ์šฉ๋œ ์ž์„ธํ•œ ์‚ฌ์–‘ ๋ฐ ํ”„๋ ˆ์ž„์›Œํฌ ๋ฒ„์ „์€ ํ‘œ 1๊ณผ ๊ฐ™๋‹ค.

๊ทธ๋ฆผ. 6. GIMP ํ”Œ๋Ÿฌ๊ทธ์ธ์˜ ํ”„๋ก ํŠธ ์—”๋“œ

Fig. 6. GIMP Plugin Front End

../../Resources/kiee/KIEEP.2020.69.4.275/fig6.png

ํ‘œ 1. ์‹คํ—˜ ํ™˜๊ฒฝ

Table 1. Test Environment

HW

Specification

SW

Version

CPU

Intel i7-7800X

Python

3.8.5

RAM

64GB

Pytorch

1.6

OS

Arch Linux

ONNX

1.5

4.3 ์ฑ„์ƒ‰ ์„ฑ๋Šฅ์˜ ์‹œ๊ฐ์  ๋ถ„์„

์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์€ ์ฑ„์ƒ‰์˜ ํ•ด์ƒ๋„ ์ œ์•ฝ์„ ์—†์• ๊ธฐ ์œ„ํ•ด ์ฑ„์ƒ‰ ๋‹จ๊ณ„ ๋ฐ ์„ ํ™” ํ•ฉ์„ฑ ๋‹จ๊ณ„๋ฅผ ๋‚˜๋ˆ„์–ด ์ง„ํ–‰ํ–ˆ๋‹ค. ์ฑ„์ƒ‰ ๋‹จ๊ณ„์—์„œ ์ƒ์„ฑ๋œ ๊ฒฐ๊ณผ๋ฌผ์€ ๊ทธ๋ฆผ 7์„ ํ†ตํ•ด ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ทธ๋ฆผ 7(a)์€ ๋‚ฎ์€ ์ฑ„๋„์˜ ๊ฒฐ๊ณผ๋ฌผ์ด ์ƒ์„ฑ๋˜๋ฉฐ ์ƒ‰์ด ๋ฒˆ์ง€๋Š” ๊ฒฝํ–ฅ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ทธ๋ฆผ 7(b)์€ transpose convolution์˜ checkerboard artifacts ํ˜„์ƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ , ์ผ๋ถ€ ์„ ํ™” ๋ถ„ํฌ์— ๋Œ€ํ•ด ์ฑ„์ƒ‰์ด ๋ถˆ์•ˆ์ •ํ•œ ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค. ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ• ๊ทธ๋ฆผ 7(c)์€ ๋†’์€ ์ฑ„๋„๋ฅผ ๊ฐ€์ง€๋ฉฐ ์ฑ„์ƒ‰ ์˜์—ญ์—์„œ ์ƒ‰์ด ๋ฒˆ์ง€๋Š” ํ˜„์ƒ ๋˜ํ•œ ๋ฐฉ์ง€ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์„ ํ™” ํ•ฉ์„ฑ ๋‹จ๊ณ„์˜ ๊ฒฐ๊ณผ๋ฌผ์€ ๊ทธ๋ฆผ 8์„ ํ†ตํ•ด ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•ด์ƒ๋„์˜ ์ฐจ์ด๋ฅผ ๋ณด๊ธฐ ์œ„ํ•ด ์งง์€ ๋ณ€์„ ๊ธฐ์ค€์œผ๋กœ 1,500 pixel ์ด์ƒ์˜ ์„ ํ™”๋กœ ์ฑ„์ƒ‰ํ•œ ํ›„ ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ์„ ํ†ตํ•œ ์„ ํ™” ํ•ฉ์„ฑ์„ ์ง„ํ–‰ํ–ˆ๋‹ค. ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ทธ๋ฆผ 8 (a)์™€ (b)๋ณด๋‹ค ์ž…๋ ฅํ•œ ์„ ํ™”๋ฅผ ๊ณ ์ฃผํŒŒ ์„ฑ๋ถ„์œผ๋กœ ํ•ฉ์„ฑํ•œ (c) ๊ฐ€ ๋จธ๋ฆฌ์นด๋ฝ ๋ฐ ๋ˆˆ๋™์ž ์„ ๊ณผ ๊ฐ™์€ ์งˆ๊ฐ์„ ์ž˜ ๋ณต์›ํ•˜์˜€๋‹ค.

4.4 ์ฑ„์ƒ‰ ์„ฑ๋Šฅ์˜ ์ •๋Ÿ‰์  ๋ถ„์„

์‚ฌ์šฉํ•œ ๊ธฐ๋ฒ•๊ณผ ๊ธฐ์กด ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ฐ ์ด๋ฏธ์ง€์˜ ์ •๋Ÿ‰์  ๋ถ„์„์„ ์œ„ํ•ด Frยดechet Inception Distance (FID) (34)๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. FID๋Š” ๋‘ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ ์…‹์˜ ์œ ์‚ฌ์„ฑ์„ ์ธก์ •ํ•œ๋‹ค. ์‹œ๊ฐ

๊ทธ๋ฆผ. 7. ์‹œ๊ฐ์  ๋น„๊ต (a):Base [11], (b):Pix2Pix [3], (c):Ours

Fig. 7. Visual comparison. (a):Base [11], (b):Pix2Pix [3], (c):Ours

../../Resources/kiee/KIEEP.2020.69.4.275/fig7.png

์  ํ’ˆ์งˆ์— ๋Œ€ํ•œ ์ธ๊ฐ„์˜ ํŒ๋‹จ๊ณผ ์ž˜ ์—ฐ๊ด€๋˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์œผ๋ฉฐ ์ฃผ๋กœ GAN์œผ๋กœ ์ƒ์„ฑ๋œ ์ƒ˜ํ”Œ์˜ ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ์ž์ฃผ ์‚ฌ์šฉ๋œ๋‹ค. FID๋Š” ImageNet (28) ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ Inception ๋ชจ๋ธ์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์‹ค์ œ ์ด๋ฏธ์ง€์™€ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€ ๊ฐ„์˜ ๋‘ ์ •๊ทœ๋ถ„ํฌ ํŠน์ง• ๋งต ์‚ฌ์ด์˜ Frยดechet ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด ๊ณ„์‚ฐ๋œ๋‹ค. FID ์ ์ˆ˜๋Š” ๋‚ฎ์„์ˆ˜๋ก ๋†’์€ ํ’ˆ์งˆ์„ ๊ฐ€์ง„๋‹ค. ํ‰๊ฐ€์—๋Š” 14 ๋งŒ ์žฅ์˜ XDoG ์ธ๊ณต ์„ ํ™”์™€ 530์žฅ์˜ ์‹ค์ œ ์„ ํ™”-์ผ๋Ÿฌ์ŠคํŠธ ์Œ์„ ์‚ฌ์šฉํ•ด ๊ฐ๊ฐ ๋น„๊ตํ–ˆ๋‹ค. FID์„ ์‚ฌ์šฉํ•œ ํ‰๊ฐ€๋Š” ๊ณต์ •์„ฑ์„ ์œ„ํ•ด ์—ฐ๊ตฌ(11,13)์™€ ๋™์ผํ•˜๊ฒŒ ํžŒํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ฒฐ๊ณผ๋ฌผ์„ ๋น„๊ตํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ–ˆ๋‹ค.

๊ทธ๋ฆผ. 8. ์‹œ๊ฐ์  ๋น„๊ต (์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ). (a):Base [11], (b):Pix2Pix [3], (c):Ours

Fig. 8. Visual Comparison(Frequency Separation). (a):Base [11], (b):Pix2Pix [3], (c):Ours

../../Resources/kiee/KIEEP.2020.69.4.275/fig8.png

์ฃผํŒŒ์ˆ˜ ๋ถ„ํ•  ๊ธฐ๋ฒ•์˜ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ์ด๋ฏธ์ง€ ํ’ˆ์งˆํ‰๊ฐ€์— ์‚ฌ์šฉ๋˜๋Š” PSNR (Peak Signal to Noise Ratio) ๋ฐ SSID (Structural SIMilarity)๋ฅผ ์‚ฌ์šฉํ•ด ์ด๋ฏธ์ง€์˜ ์†์‹ค ์ •๋ณด๋ฅผ ํ‰๊ฐ€ํ–ˆ๋‹ค. PSNR์€ ์ตœ๋Œ€ ์‹ ํ˜ธ ์ „๋ ฅ ๋ฐ ๊ทธ ํ’ˆ์งˆ์— ์˜ํ–ฅ์„ ์ฃผ๋Š” ๋…ธ์ด์ฆˆ ๊ฐ„์˜ ๋น„์œจ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. log ํ•ญ์œผ๋กœ ๊ณ„์‚ฐ๋˜๊ธฐ ๋•Œ๋ฌธ์— dB ํ˜•์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ธ๋‹ค. PSNR์€ ์†์‹ค์ด ์ ์„์ˆ˜๋ก ๋†’์€ ์ˆ˜์น˜๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. SSID๋Š” ํœ˜๋„์™€ ๋ช…์•”๋น„๋ฅผ ๊ณ ๋ คํ•œ ์ด๋ฏธ์ง€์˜ ๊ตฌ์กฐ์ ์ธ ์ฐจ์ด๋ฅผ ํฌํ•จํ•˜์—ฌ ๊ณ„์‚ฐํ•œ๋‹ค. SSID๋Š” 1์— ๊ทผ์ ‘ํ• ์ˆ˜๋ก ๋†’์€ ํ’ˆ์งˆ(์›๋ณธ ์ด๋ฏธ์ง€์™€ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€)์„ ๊ฐ€์ง„๋‹ค. ํ‰๊ฐ€์— ์‚ฌ์šฉํ•œ ์ด๋ฏธ์ง€๋Š” ์งง์€ ๋ณ€์„ ๊ธฐ์ค€์œผ๋กœ 1,500 pixel ์ด์ƒ์˜ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋ฅผ ์ฑ„์ƒ‰ํ•˜์—ฌ ์›๋ณธ ์‚ฌ์ด์ฆˆ๋กœ ํฌ๊ธฐ ์กฐ์ •ํ•ด ์›๋ณธ ์ด๋ฏธ์ง€์™€ ๋น„๊ตํ•˜์˜€๋‹ค.

ํ‘œ 2. FID, PSNR, SSIM๋ฅผ ์‚ฌ์šฉํ•œ ์ •๋Ÿ‰์ • ๋น„๊ต

Table 2. Quantitative comparison of FID (lower is better), PSNR (higher is better) and SSIM (higher is better)

Model

FID

PSNR

SSIM

mean

std

mean

std

mean

std

Base [11]

51.64

1.36

13.01

5.14

0.73

0.21

Pix2Pix [3]

57.47

3.93

13.55

2.71

0.79

0.11

Ours

47.87

2.71

20.77

3.62

0.86

0.09

FID ํ‰๊ฐ€ ๊ฒฐ๊ณผํ‘œ 2์—์„œ ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•(Ours)์„ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ ๊ฐ€์žฅ ๋‚ฎ์€ FID ์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, checkerboard artifacts (23)์˜ ํ˜„์ƒ์ด ๋‚˜ํƒ€๋‚œ Pix2Pix ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์€ Base (11)์— ๋น„ํ•ด ๋†’์€ FID(๋‚ฎ์€ ํ’ˆ์งˆ)๋ฅผ ๊ธฐ๋กํ•˜์˜€๋‹ค. ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ์ œ์•ˆ ๊ธฐ๋ฒ•์˜ ๊ฒฝ์šฐ PSNR ๋ฐ SSIM์—์„œ ๋‹ค๋ฅธ ๋‘ ๊ธฐ๋ฒ•(Base, Pix2pix)์— ๋น„ํ•ด 20.77(PSNR), 0,86(SSIM) ๋†’์€ ์ ์ˆ˜๋กœ ์šฐ์ˆ˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ๋†’์€ ์ ์ˆ˜๋ฅผ ๋ฐ›์€ ์ด์œ ๋Š” ๊ธฐ์กด ๋ชจ๋ธ๊ณผ ๋น„๊ตํ•ด ์ฑ„์ƒ‰์˜ ์ •ํ™•๋„ ๋ฐ ์ƒ‰ ๋ฒˆ์ง ๋“ฑ์˜ ์ธ๊ณต๋ฌผ์ด ์ ๊ฒŒ ๋‚˜ํƒ€๋‚˜ PSNR์—์„œ ์šฐ์ˆ˜ํ•œ ์ ์ˆ˜๋ฅผ ๋ฐ›์•˜๋‹ค. ๋˜ํ•œ ์„ ํ™”์˜ ์งˆ๊ฐ์„ ์œ ์ง€ํ•˜๋Š” ์‹œ๊ฐ์ ์ธ ๊ฒฐ๊ณผ์™€ ๋น„๊ตํ•ด ์ƒ๊ฐํ–ˆ์„ ๋•Œ SSIM ํ‰๊ฐ€ ์ ์ˆ˜์—์„œ๋„ ๋‹ค๋ฅธ ๊ธฐ๋ฒ•๊ณผ ๋น„๊ตํ•ด ๋†’์€ ์ ์ˆ˜๋ฅผ ์–ป์—ˆ๋‹ค.

5. ๊ฒฐ ๋ก 

๋ณธ ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด ์„ ํ™” ์ž๋™ ์ฑ„์ƒ‰ ๊ธฐ๋ฒ•๋“ค์ด ์ตœ๋Œ€ 512x512 pixel๋กœ ์‚ฐ์—… ์ˆ˜์ค€๋ณด๋‹ค ๋‚ฎ์€ ํ•ด์ƒ๋„๋ฅผ ๊ฐ€์ง€๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ด์ค‘์ƒ์„ฑ์ž ๋ฐ ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ•  ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ–ˆ๋‹ค. ์ด์ค‘์ƒ์„ฑ์ž๋Š” ์ฑ„์ƒ‰์˜ ๋‹จ๊ณ„๋ฅผ ์ดˆ์•ˆ ๋ฐ ์ฑ„์ƒ‰์œผ๋กœ ์—ญํ™œ์„ ๋‚˜๋ˆ„์–ด ๊ฐ๊ฐ์˜ ์ƒ์„ฑ์ž๋ฅผ ํ•™์Šตํ•œ๋‹ค. ์ดˆ์•ˆ ๋ชจ๋ธ ๋ฐ ์ฑ„์ƒ‰ ๋ชจ๋ธ์€ ๊ฐ๊ฐ ํ’๋ถ€ํ•œ ์ฑ„์ƒ‰์„ ์ง„ํ–‰ํ•˜๊ณ , ์ฑ„์ƒ‰ ๊ณผ์ •์—์„œ ์ƒ์„ฑ๋˜๋Š” ๋‹ค์–‘ํ•œ ์ธ๊ณต๋ฌผ์„ ์ œ๊ฑฐํ•˜์—ฌ ๊น”๋”ํ•œ ์ฑ„์ƒ‰์„ ์ง„ํ–‰ํ•œ๋‹ค. ์›๋ณธ ์„ ํ™”์˜ ์งˆ๊ฐ์„ ๋ณด์กดํ•˜๋ฉฐ ์‚ฌ์šฉ๋œ ์„ ํ™” ํ•ด์ƒ๋„๋กœ ์ฑ„์ƒ‰ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ์€ ๊ณ ์ฃผํŒŒ(์ž…๋ ฅํ•œ ์„ ํ™”)์™€ ์ €์ฃผํŒŒ(์ฑ„์ƒ‰ ๋ชจ๋ธ์—์„œ ์ƒ์„ฑ๋œ ์ฑ„์ƒ‰ ์ด๋ฏธ์ง€)๋ฅผ ์‚ฌ์šฉํ•ด ํ•ด์ƒ๋„๋ฅผ ๋Š˜๋ ค 1500 pixel ์ด์ƒ์˜ ๊ณ ํ•ด์ƒ๋„ ์„ ํ™” ์ด๋ฏธ์ง€ ์ฑ„์ƒ‰์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์˜ ์‹œ๊ฐ์ ์ธ ๋น„๊ต ๊ฒฐ๊ณผ๋Š” ๊ทธ๋ฆผ 7, 8์„ ํ†ตํ•ด ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋น„๊ต ๊ฒฐ๊ณผ ์ œ์•ˆํ•œ ์ด์ค‘์ƒ์„ฑ์ž ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ ์ƒ‰ ๋ฒˆ์ง, ์ฒด์ปค๋ณด๋“œ ์ธ๊ณต๋ฌผ, ์ž˜๋ชป๋œ ์ƒ‰์ƒ ๋“ฑ ๋น„์ •์ƒ์ ์ธ ์ฑ„์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค. ๋ฐ˜๋ฉด ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ฒฐ๊ณผ๋Š” ์ƒ‰ ๋ฒˆ์ง๊ณผ ๊ฐ™์€ ํ˜„์ƒ์ด ์ ๊ฒŒ ๋ณด์ด๊ณ  ์ •ํ™•ํ•œ ์ฑ„์ƒ‰์ด ๊ฐ€๋Šฅํ–ˆ๋‹ค. ๋‚ฎ์€ ํ•ด์ƒ๋„ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ•  ๋ฐฉ์‹์˜ ์„ ํ™” ํ•ฉ์„ฑ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ๊ธฐ์กด ์ด๋ฏธ์ง€์—์„œ๋Š” ๋ˆˆ, ๋จธ๋ฆฌ์นด๋ฝ ๋“ฑ ์ด๋ฏธ์ง€์˜ ํ’ˆ์งˆ์„ ๋ณผ ์ˆ˜ ์žˆ๋Š” ์งˆ๊ฐ ์ •๋ณด๊ฐ€ ์‚ฌ๋ผ์ง€๋Š” ํ˜„์ƒ์„ ๋ณด์˜€๋‹ค. ํ•ด์ƒ๋„๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ณผ์ •์—์„œ CNN์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ์กด ๊ธฐ๋ฒ•์—์„œ ์ง€์›ํ•˜์ง€ ์•Š์€ 2,000 pixel ์ด์ƒ์˜ ์ดˆ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€ ๋˜ํ•œ ์ฑ„์ƒ‰ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ •๋Ÿ‰์  ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด FID, PSNR ๊ทธ๋ฆฌ๊ณ  SSIM์„ ์‚ฌ์šฉํ•ด ์ฑ„์ƒ‰๋œ ๊ฒฐ๊ณผ๋ฌผ์„ ๋น„๊ตํ–ˆ๋‹ค. ๋น„๊ต ๊ฒฐ๊ณผ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์˜ ํ’ˆ์งˆ์„ ๋‚˜ํƒ€๋‚ด๋Š” FID ํ‰๊ฐ€์—์„œ ๊ธฐ์กด ๊ธฐ๋ฒ•(11)์˜ 51.64๋ณด๋‹ค ๋‚ฎ์€(๋†’์€ ํ’ˆ์งˆ) ์ ์ˆ˜์ธ 47.87์„ ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋ฉฐ PSNR ๋ฐ ๊ตฌ์กฐ์  ์œ ์‚ฌ๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” SSIM ํ‰๊ฐ€ ์—ญ์‹œ 13.01, 0.72๋ณด๋‹ค ๋†’์€(๋†’์€ ํ’ˆ์งˆ) 20.77, 0.86์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ์‹œ๊ฐ์ , ์ •๋Ÿ‰์  ํ‰๊ฐ€ ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ ์ด์ค‘์ƒ์„ฑ์ž ๋ฐ ์ฃผํŒŒ์ˆ˜ ๋ถ„ํ• ์„ ์‚ฌ์šฉํ•œ ์ฑ„์ƒ‰ ๊ธฐ๋ฒ•์€ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€ ์ฑ„์ƒ‰ ํ’ˆ์งˆ์„ ํ–ฅ์ƒํ•œ๋‹ค๋Š” ๊ฒฐ๋ก ์„ ๋„์ถœํ•  ์ˆ˜ ์žˆ๋‹ค.

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1G1A1100455).

References

1 
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, 2014, Generative adversarial nets, Advances in neural infor- mation processing systems, pp. 2672-2680Google Search
2 
pixiv inc., 2019, Petalica paint., https://petalica-paint.pixiv.dev/index_en.html,[Online; accessed 2020.11.23]Google Search
3 
P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros, 2017, Image- to-image translation with conditional adversarial networks, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134Google Search
4 
S. Kang, J. Choo, J. Chang, 2017, Consistent comic colori- zation with pixel-wise background classi๏ฌcation, NIPSโ€™17 Workshop on Machine Learning for Creativity and DesignGoogle Search
5 
C. Furusawa, K. Hiroshiba, K. Ogaki, Y. Odagiri, 2017, Comicolorization: semi-automatic manga colorization, SIGGRAPH Asia 2017 Technical Briefs, pp. 1-4DOI
6 
P. Hensman, K. Aizawa, 2017, cgan-based manga colorization using a single training image, 2017 14th IAPR Inter- national Conference on Document Analysis and Recognition (ICDAR), IEEE, Vol. 3, pp. 72-77DOI
7 
L. Zhang, Y. Ji, X. Lin, C. Liu, 2017, Style transfer for anime sketches with enhanced residual u-net and auxiliary classi๏ฌer gan, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), IEEE, pp. 506-511DOI
8 
P. Sangkloy, J. Lu, C. Fang, F. Yu, J. Hays, 2017, Scribbler: Controlling deep image synthesis with sketch and color, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5400-5409Google Search
9 
Y. Liu, Z. Qin, Z. Luo, H. Wang, 2017, Auto-painter: Cartoon image generation from sketch by using conditional generative adversarial networks, arXiv preprint arXiv:1705. 01908Google Search
10 
K. Frans, 2017, Outline colorization through tandem adversarial networks, arXiv preprint arXiv:1704.08834Google Search
11 
Y. Ci, X. Ma, Z. Wang, H. Li, Z. Luo, 2018, User-guided deep anime line art colorization with conditional adversarial networks, Proceedings of the 26th ACM international conference on Multimedia, pp. 1536-1544DOI
12 
L. Zhang, C. Li, T.-T. Wong, Y. Ji, C. Liu, 2018, Two-stage sketch colorization, ACM Transactions on Graphics (TOG), Vol. 37, No. 6, pp. 1-14DOI
13 
Y. Hati, G. Jouet, F. Rousseaux, C. Duhart, 2019, Paintstorch: a user-guided anime line art colorization tool with double generator conditional adversarial network, European Con- ference on Visual Media Production, pp. 1-10DOI
14 
C. Ledig, L. Theis, F. Huszยดar, J. Caballero, A. Cunning- ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, 2017, Photorealistic single image super-resolution using a generative adversarial network, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690Google Search
15 
M. Biยดnkowski, J. Donahue, S. Dieleman, A. Clark, E. Elsen, N. Casagrande, L. C. Cobo, K. Simonyan, 2019, High ๏ฌdelity speech synthesis with adversarial networks, arXiv preprint arXiv:1909.11646Google Search
16 
M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, H. Greenspan, 2018, Synthetic data augmentation using gan for improved liver lesion classi๏ฌcation, 2018 IEEE 15th inter- national symposium on biomedical imaging (ISBI 2018), IEEE, pp. 289-293DOI
17 
A. Radford, L. Metz, S. Chintala, 2015, Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv preprint arXiv:1511.06434Google Search
18 
S. Ioffe, C. Szegedy, 2015, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint arXiv:1502.03167Google Search
19 
B. Dai, S. Fidler, R. Urtasun, D. Lin, Oct 2017, Towards diverse and natural image descriptions via a conditional gan, Proceedings of the IEEE International Conference on Computer Vision (ICCV)Google Search
20 
K. Simonyan, A. Zisserman, 2014, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556Google Search
21 
H. Winnemยจoller, J. E. Kyprianidis, S. C. Olsen, 2012, Xdog: an extended difference-of-gaussians compendium including advanced image stylization, Computers & Graphics, Vol. 36, No. 6, pp. 740-753DOI
22 
O. Ronneberger, P. Fischer, T. Brox, 2015, U-net: Con- volutional networks for biomedical image segmentation, International Conference on Medical image computing and computer-assisted intervention, Springer, pp. 234-241DOI
23 
A. Odena, V. Dumoulin, C. Olah, 2016, Deconvolution and checkerboard artifacts, Distill, Vol. 1, No. 10, pp. e3DOI
24 
W. Shi, J. Caballero, F. Huszยดar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, Z. Wang, 2016, Real-time single image and video super-resolution using an ef๏ฌcient sub- pixel convolutional neural network, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874-1883Google Search
25 
S. Xie, R. Girshick, P. Dollar, Z. Tu, K. He, July 2017, Aggre- gated residual transformations for deep neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)Google Search
26 
S. Nah, T. Hyun Kim, K. Mu Lee, 2017, Deep multi-scale convolutional neural network for dynamic scene deblurring, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3883-3891Google Search
27 
Y. Wu, K. He, 2018, Group normalization, Proceedings of the European conference on computer vision (ECCV), pp. 3-19Google Search
28 
J. Deng, W. Dong, R. Socher, L. Li, Kai Li, Li Fei-Fei, June 2009, Imagenet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recogni- tion, pp. 248-255DOI
29 
G. B. Danbooru community, A. Gokaslan., 2019, Danbooru 2017: A large-scale crowdsourced and tagged anime illu- stration dataset.., https://www.gwern.net/Danbooru2017, [Online; accessed 2020.11.23.].Google Search
30 
some, 2018, E-Shuushuu-Kawaii Image Board., https://e-shuushuu.net/, [Online; accessed 19-July-2018]Google Search
31 
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, 2017, Automatic differentiation in pytorchGoogle Search
32 
D. P. Kingma, J. Ba, 2014, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980Google Search
33 
J. Bai, F. Lu, K. Zhang, 2019, Onnx: Open neural network exchange., https://github.com/onnx/onnxGoogle Search
34 
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, 2017, Gans trained by a two time-scale update rule converge to a local nash equilibrium, Advances in neural information processing systems, pp. 6626-6637Google Search

์ €์ž์†Œ๊ฐœ

Yeongseop Lee
../../Resources/kiee/KIEEP.2020.69.4.275/au1.png

Youngseop Lee graduated from Gyeongsang National University in 2020.

He is pursuing master degree at the Dept of Information Science, Gyeongsang National University.

His research interests includes Machine Learning, Neural Network, Image Generation, and Image Processing.

Seongjin Lee
../../Resources/kiee/KIEEP.2020.69.4.275/au2.png

Seongjin Lee graduated from Hanyang University in 2006.

He recieved Master and Ph.D. degree in the same university in 2008 and 2015, respectively.

He worked as postdoc in Storage Center Hanyang University till 2017 and became an assistant research professor there.

He joined Gyeongsang National University in 2017 as an assistant professor.

His research interest includes Operating System, Storage System, System Optimization, Avionics, and Machine Learning.