• ๋Œ€ํ•œ์ „๊ธฐํ•™ํšŒ
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • ํ•œ๊ตญ๊ณผํ•™๊ธฐ์ˆ ๋‹จ์ฒด์ด์—ฐํ•ฉํšŒ
  • ํ•œ๊ตญํ•™์ˆ ์ง€์ธ์šฉ์ƒ‰์ธ
  • Scopus
  • crossref
  • orcid

  1. (Dept. of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea.)



Triple inverted pendulum, Reinforcement learning, Sim-to-Real Learning, Transition control

1. ์„œ ๋ก 

๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์€ ๋น„์ตœ์†Œ ์œ„์ƒ ํŠน์„ฑ๊ณผ ๋น„์„ ํ˜•์ ์ธ ๋ชจ๋ธ ๋ฐฉ์ •์‹์„ ๊ฐ€์ง€๋ฉฐ ๋ถˆ์•ˆ์ •ํ•œ ๋™ํŠน์„ฑ์„ ์ง€๋‹Œ ๋Œ€ํ‘œ์ ์ธ ๋ถ€์กฑ ๊ตฌ๋™ ์‹œ์Šคํ…œ์ด๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์„ฑ์œผ๋กœ ์ธํ•ด ๋„๋ฆฝ์ง„์ž๋Š” ์ƒˆ๋กœ์šด ์ œ์–ด ์ด๋ก ์ด๋‚˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์œ ํšจ์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ์— ์ ํ•ฉํ•œ ์‹คํ—˜ ๋ชจ๋ธ๋กœ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜์–ด ์™”๋‹ค. ํŠนํžˆ ์‹œ์Šคํ…œ์˜ ๋ถˆ์•ˆ์ •์„ฑ๊ณผ ๋น„์„ ํ˜•์  ํŠน์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ œ์–ดํ•˜๊ธฐ ์œ„ํ•ด ์ง„์ž๋ฅผ ์ˆ˜์ง ์ž์„ธ๋กœ ๋„๋‹ฌ์‹œํ‚ค๋Š” swing-up ์ œ์–ด๋‚˜ ํ•ด๋‹น ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜๋Š” ๊ท ํ˜• ์ œ์–ด๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์—ˆ๋‹ค[1,2,3]. ์ตœ๊ทผ ์ œ์–ด๊ธฐ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ๊ณ ๋‚œ๋„์˜ ์ œ์–ด ๋Œ€์ƒ์„ ํ•„์š”๋กœ ํ•จ์— ๋”ฐ๋ผ ์ง„์ž์˜ ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œํ‚จ ๋‹ค๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํžˆ ์ง„ํ–‰๋˜๊ณ  ์žˆ์œผ๋ฉฐ, ๊ทธ์ค‘ 3๋‹จ ๋„๋ฆฝ์ง„์ž๋ฅผ ํ™œ์šฉํ•œ ์ œ์–ด๊ธฐ ์„ค๊ณ„์™€ ์„ฑ๋Šฅ ๊ฒ€์ฆ ๋˜ํ•œ ์ˆ˜ํ–‰๋˜๊ณ  ์žˆ๋‹ค[4,5]. ๋‹ค๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์€ ๋งํฌ๊ฐ€ ์ถ”๊ฐ€๋จ์— ๋”ฐ๋ผ ์‹œ์Šคํ…œ์˜ ์ƒํƒœ ๋ณ€์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉฐ ์ด๋Š” ์ œ์–ด ๋‚œ๋„๋ฅผ ํฌ๊ฒŒ ๋†’์ผ ๋ฟ ์•„๋‹ˆ๋ผ ๊ธฐ์กด์˜ ์ œ์–ด ์ „๋žต์œผ๋กœ๋Š” ๋‹ค๋ฃจ๊ธฐ ์–ด๋ ค์šด ์ƒˆ๋กœ์šด ์ œ์–ด ๋ฌธ์ œ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ํŠนํžˆ ๋‹ค๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์—์„œ๋Š” ๋‹จ์ˆœํžˆ ์ง„์ž๋ฅผ ์„ธ์šฐ๊ฑฐ๋‚˜ ๊ท ํ˜• ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋„˜์–ด ๋ณต์ˆ˜์˜ ๊ท ํ˜•์ (Equilibrium Point) ๊ฐ„ ์ฒœ์ด๋ฅผ ์š”๊ตฌํ•˜๋Š” ์ฒœ์ด ์ œ์–ด(Transition Control) ๋ฌธ์ œ๊ฐ€ ์ฃผ์š”ํ•œ ์ œ์–ด ๋ฌธ์ œ๋กœ ํ™•์žฅ๋œ๋‹ค.

์ฒœ์ด ์ œ์–ด๋Š” ๋‹ค๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์—์„œ swing-up ์ œ์–ด์™€ ์œ ์‚ฌํ•œ ํŠน์„ฑ์„ ๊ฐ€์ง€๋ฉด์„œ๋„ ๋”์šฑ ํ™•์žฅ๋œ ๊ฐœ๋…์„ ํฌํ•จํ•œ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์˜ ๊ท ํ˜•์ ์€ ๊ฐ ์ง„์ž์˜ ์ƒํƒœ์— ๋”ฐ๋ผ ์ง„์ž๊ฐ€ ์œ„์ชฝ์„ ํ–ฅํ•œ ๋ถˆ์•ˆ์ •ํ•œ ๊ท ํ˜•์ ๊ณผ ์•„๋ž˜์ชฝ์„ ํ–ฅํ•œ ์•ˆ์ •ํ•œ ๊ท ํ˜•์ ์œผ๋กœ ๋‚˜๋ˆ„์–ด์ง„๋‹ค. ๋‹จ์ผ ์ง„์ž ์‹œ์Šคํ…œ์—์„œ๋Š” ๋ถˆ์•ˆ์ •ํ•œ ๊ท ํ˜•์ ์ด ํ•˜๋‚˜๋ฟ์ด์ง€๋งŒ ๋‹ค๋‹จ ๊ตฌ์กฐ์—์„œ๋Š” ์ง„์ž์˜ ๊ฐœ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก ๋‹ค์–‘ํ•œ ์กฐํ•ฉ์˜ ๊ท ํ˜•์ ๋“ค์ด ์กด์žฌํ•˜๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹ค์ˆ˜์˜ ๊ท ํ˜•์  ๊ฐ„์„ ์ด๋™ํ•˜๋Š” ์ฒœ์ด ์ œ์–ด๋Š” swing-up ์ œ์–ด๊ฐ€ ์ฃผ๋กœ ์•ˆ์ •ํ•œ ๊ท ํ˜•์ ์—์„œ ๋ถˆ์•ˆ์ •ํ•œ ๊ท ํ˜•์ ์œผ๋กœ์˜ ์ด๋™๋งŒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ ์—ฌ๋Ÿฌ ๋ถˆ์•ˆ์ •ํ•œ ๊ท ํ˜•์  ๊ฐ„์˜ ์ฒœ์ด๋ฅผ ํฌํ•จํ•˜๋ฏ€๋กœ ๋”์šฑ ๋ณต์žกํ•œ ์ œ์–ด ์ „๋žต์„ ์š”๊ตฌํ•œ๋‹ค. ์ฒœ์ด ์ œ์–ด๋Š” ํ˜„์žฌ ๊ท ํ˜•์ ์—์„œ์˜ ๊ท ํ˜• ์ œ์–ด, ๋ชฉํ‘œ ๊ท ํ˜•์ ์œผ๋กœ์˜ ์ฒœ์ด ์ œ์–ด, ๋ชฉํ‘œ ๊ท ํ˜•์ ์—์„œ์˜ ๊ท ํ˜• ์ œ์–ด์˜ ์ˆœ์„œ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ ๊ฐ ๋‹จ๊ณ„๋Š” ์—ฐ์†์ ์ธ ์ œ์–ด ๋™์ž‘์„ ํ†ตํ•ด ์ˆ˜ํ–‰๋œ๋‹ค. ์ด์— ๋”ฐ๋ผ ์ฒœ์ด ์ œ์–ด์˜ ์„ฑ๊ณต์ ์ธ ๊ตฌํ˜„์„ ์œ„ํ•ด ๊ฐ๊ฐ์˜ ์ œ์–ด๊ฐ€ ์œ ๊ธฐ์ ์œผ๋กœ ์ž‘๋™ํ•˜๋Š” ์ œ์–ด ์ „๋žต์ด ํ•„์š”ํ•˜๋‹ค.

์ตœ๊ทผ ๋„๋ฆฝ์ง„์ž ์ฒœ์ด ์ œ์–ด ์—ฐ๊ตฌ์—์„œ๋Š” Direct collocation ๊ธฐ๋ฒ•๊ณผ ๊ฐ™์€ ์ตœ์  ์ œ์–ด ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์ฒœ์ด ๊ถค์ ์„ ์„ค๊ณ„ํ•˜์˜€๋‹ค[6,7]. ๊ทธ๋Ÿฌ๋‚˜ ์‚ฌ์ „์— ๊ณ„์‚ฐ๋œ Optimal trajectory๋Š” ์™ธ๋ž€์ด๋‚˜ ๋ชจ๋ธ ๋ถˆํ™•์‹ค์„ฑ์— ๋Œ€ํ•œ ๋ฏผ๊ฐ์„ฑ์ด ๋†’์•„ ์‹ค์ œ ์‹œ์Šคํ…œ์— ์ ์šฉ ์‹œ ์•ˆ์ •์ ์ธ ์ œ์–ด ์„ฑ๋Šฅ์„ ํ™•๋ณดํ•˜๊ธฐ ์–ด๋ ต๋‹ค๋Š” ํ•œ๊ณ„๋ฅผ ์ง€๋‹Œ๋‹ค. ํŠนํžˆ ์ฒœ์ด ์ œ์–ด๋Š” ์‹œ์Šคํ…œ์ด ๋‹ค์ˆ˜์˜ ๋ถˆ์•ˆ์ •ํ•œ ๊ท ํ˜•์  ์‚ฌ์ด๋ฅผ ์ด๋™ํ•ด์•ผ ํ•˜๋Š” ํŠน์„ฑ์ƒ ์™ธ๋ž€์— ๋Œ€ํ•œ ๋ฏผ๊ฐ๋„๊ฐ€ ๋”์šฑ ํฌ๊ฒŒ ๋‚˜ํƒ€๋‚œ๋‹ค. ์ตœ์  ์ œ์–ด ๊ธฐ๋ฐ˜์˜ ์ฒœ์ด ์ œ์–ด๋Š” ์„ค๊ณ„๋œ ๊ถค์ ์„ ์ •ํ™•ํžˆ ์ถ”์ข…ํ•ด์•ผ ํ•˜๋ฏ€๋กœ ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ์˜ ์™ธ๋ž€์ด ์ž‘์šฉํ•  ๊ฒฝ์šฐ ๋ชฉํ‘œ ๊ท ํ˜•์ ์œผ๋กœ์˜ ์•ˆ์ •์ ์ธ ์ˆ˜๋ ด์ด ์–ด๋ ค์šธ ์ˆ˜ ์žˆ๋‹ค[8]. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ๊ทน๋ณตํ•˜๊ณ ์ž ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ•ํ™”ํ•™์Šต์˜ ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ Sim-to-Real ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•ด 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ์ฒœ์ด ์ œ์–ด๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. Sim-to-real ๊ธฐ๋ฒ•์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ํ•™์Šตํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์— ์ ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•์ด๋‹ค[9]. ํ•ด๋‹น ๊ธฐ๋ฒ•์€ ํ•™์Šต ํ™˜๊ฒฝ์—์„œ์˜ ๋ฌผ๋ฆฌ์  ์ œ์•ฝ์ด ์—†์–ด ์ž„์˜์˜ ์ดˆ๊ธฐ ์ƒํƒœ ์„ค์ •์ด ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ ๋‹ค์–‘ํ•œ ์ƒํƒœ์—์„œ์˜ ํ•™์Šต์„ ํ†ตํ•ด ์™ธ๋ž€์— ๊ฐ•๊ฑดํ•œ ์ œ์–ด ์ •์ฑ…์„ ์ˆ˜๋ฆฝํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ Sim-to-Real ๊ธฐ๋ฒ•์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ชจ๋ธ๊ณผ ์‹ค์ œ ํ•˜๋“œ์›จ์–ด ๊ฐ„์˜ ์ฐจ์ด๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” reality gap ๋ฌธ์ œ๋ฅผ ๋™๋ฐ˜ํ•œ๋‹ค[10]. ์ด ๊ฒฉ์ฐจ๋ฅผ ํ•ด์†Œํ•˜์ง€ ๋ชปํ•  ๊ฒฝ์šฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต๋œ ์ œ์–ด ์ •์ฑ…์ด ์‹ค๋ฌผ ์‹œ์Šคํ…œ์—์„œ ์›ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ €์ž๋“ค์ด ์†Œ์†๋œ ์—ฐ๊ตฌ์‹ค์—์„œ ์ œ์ž‘ํ•œ 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด๋‹น ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•œ๋‹ค. ํ•ด๋‹น ์‹œ์Šคํ…œ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ์‚ฌ์šฉํ•  ๋ชจ๋ธ ๋ฐฉ์ •์‹๊ณผ ์‹ค๋ฌผ ์‹œ์Šคํ…œ ๊ฐ„์— ๋†’์€ ์ •ํ•ฉ์„ฑ์„ ์ง€๋…€ reality gap์„ ์ตœ์†Œํ™”ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด Sim-to-Real ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•œ ์ง์„ ํ˜• 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ 56๊ฐ€์ง€ ์ฒœ์ด ์ œ์–ด ๊ตฌํ˜„์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์˜ ๊ตฌ์„ฑ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 2์ ˆ์—์„œ๋Š” Sim-to-Real ๊ธฐ๋ฒ• ๋ฐ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•ด ์„ค๋ช…ํ•œ๋‹ค. 3์ ˆ์—์„œ๋Š” ๋ณธ ์—ฐ๊ตฌ์—์„œ ํ™œ์šฉํ•˜๋Š” 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์˜ ๊ธฐ๊ตฌ์  ์„ค๊ณ„ ๋ฐ ์ˆ˜ํ•™์  ๋ชจ๋ธ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•œ๋‹ค. 4์ ˆ์—์„œ๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ์ œ์–ด ๊ฒฐ๊ณผ๋ฅผ ๋ถ„์„ํ•œ๋‹ค. ๋์œผ๋กœ 5์ ˆ์—์„œ๋Š” ๋ณธ ์—ฐ๊ตฌ์˜ ๊ฒฐ๋ก ์„ ์„œ์ˆ ํ•œ๋‹ค.

2. Sim-to-Real ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ ๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜

2.1 Sim-to-Real ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ

๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ๋Š” ์ „ํ†ต์ ์ธ ์ œ์–ด ๋ฐฉ์‹์—์„œ ์ œ์–ด ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๋กœ ๋Œ€์ฒดํ•œ ๊ตฌ์กฐ๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๋ž€ ํ™˜๊ฒฝ๊ณผ์˜ ๋ฐ˜๋ณต์ ์ธ ์ƒํ˜ธ์ž‘์šฉ์„ ํ†ตํ•ด ์ตœ์ ์˜ ์ œ์–ด ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” ์‹œ์Šคํ…œ์„ ์˜๋ฏธํ•œ๋‹ค. ์—์ด์ „ํŠธ๋Š” ๋งค timestep์—์„œ ํ™˜๊ฒฝ์œผ๋กœ๋ถ€ํ„ฐ ๊ด€์ธก๋œ ์ƒํƒœ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ํ˜„์žฌ์˜ ์ •์ฑ…์— ๋งž์ถฐ ํ–‰๋™์„ ์„ ํƒํ•˜๊ณ  ๊ทธ์— ๋Œ€ํ•œ ๋ณด์ƒ์„ ํ†ตํ•ด ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ›๋Š”๋‹ค. ์ด๋Ÿฌํ•œ ๊ณผ์ •์ด ๋ฐ˜๋ณต๋˜๋ฉฐ ์—์ด์ „ํŠธ๋Š” ๊ฒฝํ—˜์„ ์ถ•์ ํ•˜๊ณ  ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ •์ฑ…์„ ์ ์ง„์ ์œผ๋กœ ๊ฐœ์„ ํ•ด ๋‚˜๊ฐ„๋‹ค. ํ•™์Šต๋œ ์ œ์–ด๊ธฐ๋Š” ์ฃผ์–ด์ง„ ์ƒํƒœ ์ •๋ณด๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ํ•™์Šต๋œ ์ •์ฑ…์— ๋งž์ถฐ ์ œ์–ด๋Ÿ‰์„ ์ถœ๋ ฅํ•˜๊ฒŒ ๋œ๋‹ค.

ํ•™์Šต ๋ฐ ํ‰๊ฐ€ ๊ณผ์ •์—์„œ ์—์ด์ „ํŠธ์™€ ์ƒํ˜ธ์ž‘์šฉ์ด ์ด๋ฃจ์–ด์ง€๋Š” ํ™˜๊ฒฝ์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜๋œ๋‹ค. ์ฒซ์งธ๋Š” ์‹ค๋ฌผ ์‹œ์Šคํ…œ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ๋ฌผ๋ฆฌ์  ํ™˜๊ฒฝ, ๋‘˜์งธ๋Š” ๊ฐ€์ƒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜์˜ ๊ฐ€์ƒ ํ™˜๊ฒฝ์ด๋‹ค.

์‹ค๋ฌผ ์‹œ์Šคํ…œ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•  ๊ฒฝ์šฐ ์‹œ์Šคํ…œ์˜ ์ˆ˜ํ•™์  ๋ชจ๋ธ์ด๋‚˜ ์ •ํ™•ํ•œ ๋™์—ญํ•™ ์ •๋ณด๊ฐ€ ์‚ฌ์ „์— ํ™•๋ณด๋˜์ง€ ์•Š๋”๋ผ๋„ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ์ด๋Š” ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ ์„ค๊ณ„์—์„œ ํ•„์ˆ˜์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ ์‹๋ณ„ ๊ณผ์ •์ด๋‚˜ ๋ณต์žกํ•œ ๋น„์„ ํ˜• ๋ชจ๋ธ๋ง ์—†์ด๋„ ํ™˜๊ฒฝ๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ํ†ตํ•ด ์ตœ์ ์˜ ์ •์ฑ…์„ ์ž์œจ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค. ํŠนํžˆ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์ˆ˜์ง‘๋˜๋Š” ๋ฐ์ดํ„ฐ๋Š” ์„ผ์„œ ๋…ธ์ด์ฆˆ, ๋งˆ์ฐฐ, ๋ฐฑ๋ž˜์‹œ, ํ•˜๋“œ์›จ์–ด์˜ ๋น„์„ ํ˜•์„ฑ, ์™ธ๋ž€ ๋“ฑ ๋‹ค์–‘ํ•œ ๋น„์ด์ƒ์  ์š”์†Œ(non-idealities)๋ฅผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด์™€ ๊ฐ™์€ ํ™˜๊ฒฝ์—์„œ ํ•™์Šต๋œ ์ •์ฑ…์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜ ํ•™์Šต ๊ฒฐ๊ณผ์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋” ๋†’์€ ํ˜„์‹ค ์ ํ•ฉ์„ฑ๊ณผ ๊ฐ•๊ฑด์„ฑ์„ ๊ฐ–๋Š”๋‹ค๋Š” ํŠน์ง•์ด ์žˆ๋‹ค.

ํ•˜์ง€๋งŒ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์„ ๋Œ€์ƒ์œผ๋กœ ํ•˜๋Š” ์ œ์–ด๊ธฐ ํ•™์Šต์—์„œ๋Š” ๋‹ค์–‘ํ•œ ์ œ์•ฝ๊ณผ ์œ„ํ—˜ ์š”์†Œ ๋˜ํ•œ ์กด์žฌํ•œ๋‹ค. ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์˜ ํ•™์Šต์„ ์ง„ํ–‰ํ•  ๊ฒฝ์šฐ ๋ชจ๋“  ์ง„์ž๊ฐ€ ์ค‘๋ ฅ์˜ ์˜ํ–ฅ์„ ๋ฐ›์•„ ๋ฐ”๋‹ฅ์„ ํ–ฅํ•œ ์ƒํƒœ์—์„œ ์‹œ์ž‘๋˜๋ฉฐ ์—ฐ๊ตฌ์ž๊ฐ€ ์›ํ•˜๋Š” ๊ฐ๋„ ๋ฐ ๊ฐ์†๋„๋กœ ์ดˆ๊ธฐ ์ƒํƒœ๋ฅผ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ์–ด๋ ต๋‹ค. ๋˜ํ•œ ํ•™์Šต ์†๋„ ์—ญ์‹œ ํ˜„์‹ค์˜ ๋ฌผ๋ฆฌ์  ์‹œ๊ฐ„์— ์˜ํ•ด ์ œํ•œ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ์ด์œ ๋กœ ์ตœ๊ทผ์—๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฐ˜์˜ ๊ฐ€์ƒ ํ™˜๊ฒฝ์—์„œ ์ถฉ๋ถ„ํ•œ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•œ ํ›„ ์ด๋ฅผ ์‹ค์ œ ํ™˜๊ฒฝ์— ์ด์‹ํ•˜๋Š” Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์ด ํ™œ๋ฐœํžˆ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋‹ค[11,12].

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ์˜ ํ•™์Šต์€ ์•ž์„œ ์„ค๋ช…ํ•œ ๋ฌผ๋ฆฌ์  ํ™˜๊ฒฝ์—์„œ์˜ ์ œ์•ฝ์„ ๊ทน๋ณตํ•˜๊ณ  ๋ฐ˜๋ณต์ ์ธ ์‹คํ—˜์„ ๋ณด๋‹ค ์•ˆ์ „ํ•˜๊ณ  ์ž์œ ๋กญ๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ํ•™์Šต ํšจ์œจ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. ํŠนํžˆ ๊ฐ•ํ™”ํ•™์Šต๊ณผ ๊ฐ™์ด ์ˆ˜๋งŽ์€ ์‹œํ–‰์ฐฉ์˜ค๋ฅผ ํ†ตํ•ด ์ •์ฑ…์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ์‹์—์„œ๋Š” ์‹œ์Šคํ…œ์˜ ์†์ƒ ๊ฐ€๋Šฅ์„ฑ ์—†์ด ํ•™์Šต์„ ๋ฐ˜๋ณตํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด ํฐ ์ด์ ์œผ๋กœ ์ž‘์šฉํ•œ๋‹ค. ๋˜ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ๋Š” ์ดˆ๊ธฐ ์ƒํƒœ๋ฅผ ์ž„์˜๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์‹ค์‹œ๊ฐ„ ํ•™์Šต์ด ์•„๋‹Œ ๊ฐ€์†ํ™”๋œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ๋ณด๋‹ค ์งง์€ ์‹œ๊ฐ„ ๋‚ด์— ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ•™์Šต ์†๋„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ฌ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๊ตฌํ˜„์ด ์–ด๋ ค์šด ๋‹ค์–‘ํ•œ ์ดˆ๊ธฐ ์กฐ๊ฑด์—์„œ๋„ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์–ด ์™ธ๋ž€์ด ์กด์žฌํ•˜๋Š” ํ™˜๊ฒฝ์—์„œ๋„ ๊ฐ•์ธํ•œ ์ œ์–ด ์ •์ฑ…์„ ํ˜•์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

ํ•˜์ง€๋งŒ ์•ž์„œ ์„œ๋ก ์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด Sim-to-Real ๊ธฐ๋ฒ•์€ reality gap์ด๋ผ๋Š” ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„์ ์ด ์กด์žฌํ•œ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์€ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์˜ ๋ชจ๋“  ๋ฌผ๋ฆฌ์  ํŠน์„ฑ์„ ์™„๋ฒฝํ•˜๊ฒŒ ๋ชจ์‚ฌํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต๋œ ์ •์ฑ…์ด ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๊ทธ๋Œ€๋กœ ์ ์šฉ๋˜์ง€ ์•Š๊ฑฐ๋‚˜ ์˜ˆ๊ธฐ์น˜ ๋ชปํ•œ ๋™์ž‘์„ ์œ ๋ฐœํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด์— ๋ณธ ์—ฐ๊ตฌ๋Š” reality gap์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ์‹œ๋„ ์ค‘์—์„œ ์‹คํšจ์„ฑ์ด ๋†’์€ ๋‘ ๊ฐ€์ง€ ๊ธฐ๋ฒ•์„ ์ฑ„ํƒํ•˜์—ฌ Sim-to-Real ์ œ์–ด ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ ์ž ํ•˜์˜€๋‹ค.

๋จผ์ € ์†Œํ”„ํŠธ์›จ์–ด์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋‚ด์—์„œ ์ ์šฉ ๊ฐ€๋Šฅํ•œ DR(Domain Randomization) ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•œ๋‹ค[13,14]. DR ๊ธฐ๋ฒ•์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์˜ ์ดˆ๊ธฐ ์กฐ๊ฑด์„ ๋ฌด์ž‘์œ„๋กœ ์„ ์ •ํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰์‹œํ‚ค๋Š” ๊ธฐ๋ฒ•์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๊ฐ€ ๋‹ค์–‘ํ•œ ์กฐ๊ฑด์—์„œ ํ•™์Šต์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๊ณ  ๋”์šฑ ์ผ๋ฐ˜ํ™”๋œ ์ œ์–ด ์ •์ฑ…์„ ์ˆ˜๋ฆฝํ•  ์ˆ˜ ์žˆ๋‹ค.

๋˜ํ•œ ํ•˜๋“œ์›จ์–ด์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณธ ์—ฐ๊ตฌ์‹ค์—์„œ ์ง์ ‘ ์ œ์ž‘ํ•œ 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌผ๋ฆฌ์  ์ •ํ•ฉ์„ฑ์ด ๋†’์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ํ™˜๊ฒฝ์˜ ๋ชจ๋ธ ๊ฐ„ ์ฐจ์ด๋กœ ์ธํ•œ reality gap์„ ํšจ๊ณผ์ ์œผ๋กœ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๋Ÿฌํ•œ Sim-to-Real ๊ธฐ๋ฐ˜ ํ•™์Šต ์ „๋žต์€ 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ๊ณผ ๊ฐ™์ด ์ดˆ๊ธฐ ์กฐ๊ฑด์˜ ์ œ์•ฝ์ด ํฌ๊ณ  ๋†’์€ ๋น„์„ ํ˜•์„ฑ์„ ๊ฐ€์ง€๋Š” ์ฒœ์ด ์ œ์–ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ์žˆ์–ด ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.

2.2 ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ฒœ์ด ์ œ์–ด์™€ ๊ฐ™์ด ๋ถˆ์•ˆ์ •ํ•œ ๊ท ํ˜•์  ๊ฐ„์˜ ์ฒœ์ด๋ฅผ ์š”๊ตฌํ•˜๋Š” ๊ณ ์ฐจ ๋น„์„ ํ˜• ์‹œ์Šคํ…œ์˜ ์ œ์–ด ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด Truncated Quantile Critics(TQC) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ œ์–ด๊ธฐ๋ฅผ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ์ผ๋ฐ˜์ ์ธ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ทน๋‹จ์ ์ธ ๋ณด์ƒ ์˜ˆ์ธก์œผ๋กœ ์ธํ•ด ์ •์ฑ…์ด ๋ถˆ์•ˆ์ •ํ•ด์ง€๊ฑฐ๋‚˜ ์ˆ˜๋ ด ์†๋„๊ฐ€ ์ €ํ•˜๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•˜๋ฉฐ, ํŠนํžˆ ๋„๋ฆฝ์ง„์ž์™€ ๊ฐ™์€ ๊ณ ์ฐจ ๋น„์„ ํ˜• ์‹œ์Šคํ…œ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ํ˜„์ƒ์ด ๋”์šฑ ๋นˆ๋ฒˆํ•˜๊ฒŒ ๋ฐœ์ƒํ•œ๋‹ค.

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Quantile Regression Deep Q-Network (QR-DQN)์™€ Soft Actor-Critic(SAC)์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•œ TQC๋Š” ์ตœ์‹  ๋ถ„ํฌ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์—ฐ์†์ ์ธ ํ–‰๋™ ๊ณต๊ฐ„์„ ๋Œ€์ƒ์œผ๋กœ ํ•˜๋Š” ๊ณ ์„ฑ๋Šฅ ์ •์ฑ… ํ•™์Šต์— ์ ํ•ฉํ•˜๋‹ค. TQC์˜ ํ•ต์‹ฌ ์ „๋žต์€ ์˜ˆ์ธก๋œ ๋ณด์ƒ ๋ถ„ํฌ ์ค‘ ์ƒ์œ„ ๋ถ„์œ„์ˆ˜๋ฅผ ์ œ๊ฑฐํ•จ์œผ๋กœ์จ Q-๊ฐ’์˜ ๊ณผ๋Œ€ ํ‰๊ฐ€๋ฅผ ์–ต์ œํ•˜๊ณ  ์ •์ฑ…์ด ๋ณด๋‹ค ํ˜„์‹ค์ ์ธ ๊ธฐ๋Œ€ ๋ณด์ƒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ˆ˜๋ ดํ•  ์ˆ˜ ์žˆ๋„๋ก ์œ ๋„ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด ๊ณผ์ •์€ ๊ฐ•ํ™”ํ•™์Šต ์ดˆ๊ธฐ์— ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ๊ณผ๋„ํ•œ ํƒ์ƒ‰(optimistic exploration)์„ ์–ต์ œํ•˜์—ฌ ํ•™์Šต ์•ˆ์ •์„ฑ์„ ๋†’์ด๊ณ  ์‹ค์ œ ํ™˜๊ฒฝ์— ์ ์šฉ ์‹œ ์•ˆ์ „์„ฑ ์ธก๋ฉด์—์„œ๋„ ์œ ๋ฆฌํ•˜๋‹ค.

ํ‘œ 1 ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ ๊ตฌํ˜„์— ์‚ฌ์šฉ๋œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ

Table 1 Hyperparameters used to implement reinforcement learning agents

Hyperparameter

Value

Optimizer

ADAM

Learning rate

0.0003

Discount factor ($\gamma$ )

0.99

Replay buffer size

1e6

Number of critics ( $N$)

3

Number of hidden layers in critic networks

3

Size of hidden layers in critic networks

512

Number of hidden layers in policy networks

2

Size of hidden layers in 1st policy networks

400

Size of hidden layers in 2nd policy networks

300

Minibatch size

256

Nonlinearity

ReLU

Target smoothing coefficient ( $\beta$)

0.005

Target update interval

1

Gradient steps per iteration

1

Environment steps per iteration

1

Number of atoms ($M$ )

25

ํŠนํžˆ ๋ณธ ์—ฐ๊ตฌ์—์„œ ๋‹ค๋ฃจ๋Š” 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์€ ์ƒํƒœ ๊ณต๊ฐ„์ด ๊ณ ์ฐจ์›์ด๋ฉฐ ์ดˆ๊ธฐ ์กฐ๊ฑด์˜ ๋ฏธ์„ธํ•œ ๋ณ€ํ™”๋งŒ์œผ๋กœ๋„ ๋™์ž‘์ด ๊ธ‰๊ฒฉํžˆ ๋ถˆ์•ˆ์ •ํ•ด์งˆ ์ˆ˜ ์žˆ๋Š” ํŠน์„ฑ์„ ๊ฐ€์ง„๋‹ค. ์ด์ฒ˜๋Ÿผ ๋ณด์ƒ์˜ ๋ถ„์‚ฐ์ด ํฌ๊ณ  ์‹คํŒจ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์ œ์–ด ํ™˜๊ฒฝ์—์„œ๋Š” ๋ณด์ƒ์˜ tail ์ •๋ณด๊นŒ์ง€ ๊ณ ๋ คํ•˜๋Š” ๋ถ„ํฌ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹์ด ํšจ๊ณผ์ ์ด๋ฉฐ TQC๋Š” ์ด๋Ÿฌํ•œ ํ™˜๊ฒฝ์— ํŠนํ™”๋œ ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” ๋ฐ ์žˆ์–ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ๊ฐ•์ธํ•œ ์ˆ˜๋ ด ํŠน์„ฑ์„ ๋ณด์ธ๋‹ค. ๋˜ํ•œ ๋ณต์ˆ˜์˜ critic network๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋ณด์ƒ ๋ถ„ํฌ๋ฅผ ํ•™์Šตํ•˜๊ณ  ์ด๋ฅผ ํ†ตํ•ฉํ•˜๋Š” ๊ตฌ์กฐ๋Š” ์™ธ๋ž€์ด๋‚˜ ๋ชจ๋ธ ๋ถˆํ™•์‹ค์„ฑ์ด ์กด์žฌํ•˜๋Š” ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์ •์ฑ…์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ๊ณผ ์•ˆ์ •์„ฑ์„ ๋™์‹œ์— ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ๋ณธ ์—ฐ๊ตฌ์˜ ๋ชฉ์ ๊ณผ ๋†’์€ ๋ถ€ํ•ฉ์„ฑ์„ ๊ฐ€์ง„๋‹ค.

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ฒœ์ด ์ œ์–ด ๊ณผ์ •์—์„œ ์š”๊ตฌ๋˜๋Š” ์ •๋ฐ€ํ•œ ๊ท ํ˜•์  ๊ฐ„์˜ ์ฒœ์ด์™€ ์ดˆ๊ธฐ ์กฐ๊ฑด์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ์„ ํ™•๋ณดํ•˜๊ณ ์ž ์‹œ์Šคํ…œ์˜ ํŠน์„ฑ์— ๋งž์ถฐ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์™€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•˜์˜€๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ํ•™์Šต ์†๋„์™€ ์—ฐ์‚ฐ ํšจ์œจ์„ ๊ณ ๋ คํ•˜์—ฌ critic network์˜ ๊ฐœ์ˆ˜๋ฅผ ์ค„์ด๊ณ , ๋„๋ฆฝ์ง„์ž์˜ ๊ณ ์ฐจ ๋ชจ๋ธ ๋ฐฉ์ •์‹์„ ๋ฐ˜์˜ํ•˜์—ฌ policy network์˜ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜์˜€๋‹ค. ์‚ฌ์šฉํ•œ ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ํ‘œ 1์— ์ •๋ฆฌ๋˜์–ด ์žˆ์œผ๋ฉฐ, replay buffer size ๋“ฑ์˜ ๋‚˜๋จธ์ง€ ์„ค์ •์€ Kuznetsov[15]๊ฐ€ ์ œ์•ˆํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

3. 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ ๋ฐ ์ฒœ์ด ์ œ์–ด

๊ทธ๋ฆผ 1์€ 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ธฐ๊ตฌ์  ๊ฐœ๋…๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ตญ์ œ ๋‹จ์œ„๊ณ„(SI ๋‹จ์œ„๊ณ„)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ ๊ฐ ๋ณ€์ˆ˜์˜ ์˜๋ฏธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. $M$์€ cart์˜ ์งˆ๋Ÿ‰, $m_{1}$, $m_{2}$, $m_{3}$๋Š” ๊ฐ ์ง„์ž๋“ค์˜ ์งˆ๋Ÿ‰์„ ์˜๋ฏธํ•œ๋‹ค. $l_{1}$, $l_{2}$, $l_{3}$๋Š” ๊ฐ ์ง„์ž๋“ค์˜ ํšŒ์ „์ถ•์œผ๋กœ๋ถ€ํ„ฐ ๋ฌด๊ฒŒ์ค‘์‹ฌ๊นŒ์ง€์˜ ๊ธธ์ด๋ฅผ ์˜๋ฏธํ•˜๊ณ  $L_{1}$์€ 1๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•๊ณผ 2๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•๊นŒ์ง€์˜ ๊ธธ์ด, $L_{2}$๋Š” 2๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•๊ณผ 3๋‹จ ์ง„์ž์˜ ํšŒ์ „์ถ•๊นŒ์ง€์˜ ๊ธธ์ด๋ฅผ ์˜๋ฏธํ•œ๋‹ค. $u$๋Š” cart์˜ ๊ฐ€์†๋„, $y$๋Š” cart์˜ ์ดˆ๊ธฐ์œ„์น˜๋กœ๋ถ€ํ„ฐ์˜ ๋ณ€์œ„๋ฅผ ์˜๋ฏธํ•˜๊ณ  $c_{1}$, $c_{2}$, $c_{3}$๋Š” ๊ฐ ์ง„์ž์˜ ํšŒ์ „์ถ•์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋งˆ์ฐฐ๊ณ„์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค. $\theta_{1}$์€ 1๋‹จ ์ง„์ž์˜ ํšŒ์ „ ๋ณ€์œ„๋กœ์จ ์ง€๋ฉด์— ๋Œ€ํ•œ ๋ฒ•์„ ๊ณผ ์ด๋ฃจ๋Š” ๊ฐ์ด๋ฉฐ, $\theta_{2}$๋Š” 2๋‹จ ์ง„์ž๊ฐ€ 1๋‹จ ์ง„์ž์™€ ์ด๋ฃจ๋Š” ์ƒ๋Œ€์ ์ธ ํšŒ์ „ ๋ณ€์œ„, $\theta_{3}$๋Š” 3๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž๊ฐ€ ์ด๋ฃจ๋Š” ์ƒ๋Œ€์ ์ธ ํšŒ์ „ ๋ณ€์œ„์ด๋‹ค. $i$, $j$, $k$๋Š” ๊ฐ๊ฐ rail์˜ ์ค‘์‹ฌ์ ์„ ์›์ ์œผ๋กœ ํ•˜๋Š” ์ง๊ฐ์ขŒํ‘œ๊ณ„์˜ ์ขŒํ‘œ์ถ•์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

๊ทธ๋ฆผ 1. 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ฐœ๋…๋„

Fig. 1. The conceptual diagram of a triple inverted pendulum

../../Resources/kiee/KIEE.2025.74.8.1363/fig1.png

3.1 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋ชจ๋ธ ๋ฐฉ์ •์‹

3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ์ˆ˜ํ•™์  ๋ชจ๋ธ์€ Euler-Lagrange equation์„ ์ด์šฉํ•˜์—ฌ ์œ ๋„ํ•˜๋ฉด ์‹ (1)๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

(1)
$\begin{bmatrix}n_{1}\\n_{2}\\n_{3}\end{bmatrix}\ddot{y}+\begin{bmatrix}m_{11}&m_{12}&m_{13}\\\star &m_{22}&m_{23}\\\star &\star&m_{33}\end{bmatrix}\begin{bmatrix}\ddot{\theta_{1}}\\\ddot{\theta_{2}}\\\ddot{\theta_{3}}\end{bmatrix}-\begin{bmatrix}r_{1}\\r_{2}\\r_{3}\end{bmatrix}= 0.$

์—ฌ๊ธฐ์„œ $\star$๋Š” ๋Œ€์นญํ–‰๋ ฌ์˜ ๋Œ€์นญ๋ถ€ ์š”์†Œ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. $n_{i}$, $m_{ij}$, $r_{i}$์˜ ์˜๋ฏธ๋Š” ์‹ (2)์™€ ๊ฐ™์ด ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.

(2)
$n_{1}= h_{6}\cos(\theta_{1})+h_{7}\cos(\theta_{1}+\theta_{2})\\+h_{8}\cos(\theta_{1}+\theta_{2}+\theta_{3}),\: \\n_{2}= h_{7}\cos(\theta_{1}+\theta_{2})+h_{8}\cos(\theta_{1}+\theta_{2}+\theta_{3}),\: \\n_{3}= h_{8}\cos(\theta_{1}+\theta_{2}+\theta_{3}),\: \\m_{11}= h_{1}+h_{2}+h_{3}+h_{4}+h_{5}+2h_{9}\cos(\theta_{2})\\+2h_{11}\cos(\theta_{3})+2h_{10}\cos(\theta_{2}+\theta_{3}),\: \\m_{12}= h_{2}+h_{3}+ h_{5}+h_{9}\cos(\theta_{2})+2h_{11}\cos(\theta_{3})\\+h_{10}\cos(\theta_{2}+\theta_{3}),\: \\m_{13}= h_{3}+h_{11}\cos(\theta_{3})+h_{10}\cos(\theta_{2}+\theta_{3}),\: \\m_{22}=h_{2}+h_{3}+ h_{5}+2h_{11}\cos(\theta_{3}),\: \\m_{23}= h_{3}+h_{11}\cos(\theta_{3}),\: \\m_{33}= h,\: \\r_{1}= gh_{8}\sin(\theta_{1}+\theta_{2}+\theta_{3})+gh_{9}\sin(\theta_{1}+\theta_{2})\\+gh_{6}\sin(\theta_{1})+ h_{10}d_{1}\sin(\theta_{2}+\theta_{3})\\+h_{9}d_{2}\sin(\theta_{2})+h_{11}d_{3}\sin(\theta_{3})-c_{1}\dot{\theta}_{1},\: \\r_{2}= gh_{8}\sin(\theta_{1}+\theta_{2}+\theta_{3})+gh_{9}\sin(\theta_{1}+\theta_{2})\\-h_{10}d_{4}\sin(\theta_{2}+\theta_{3})- h_{9}d_{4}\sin(\theta_{2})\\+h_{11}d_{3}\sin(\theta_{3})-c_{2}\dot{\theta}_{2},\: \\r_{3}= gh_{8}\sin(\theta_{1}+\theta_{2}+\theta_{3})-h_{10}d_{4}\sin(\theta_{2}+\theta_{3})\\-h_{11}d_{5}\sin(\theta_{3})-c_{3}\dot{\theta}_{3}.$

์—ฌ๊ธฐ์„œ $g$๋Š” ์ค‘๋ ฅ๊ฐ€์†๋„์ด๋ฉฐ $h_{i}$์™€ $d_{i}$์˜ ์˜๋ฏธ๋Š” ์‹ (3)๊ณผ ๊ฐ™๋‹ค.

(3)
$h_{1}=m_{1}l_{1}^{2}+I_{1},\: h_{2}=m_{2}l_{2}^{2}+I_{2},\: \\h_{3}=m_{3}l_{3}^{2}+I_{3},\: h_{4}=L_{1}^{2}m_{2}+L_{1}^{2}m_{3},\: \\h_{5}=L_{2}^{2}m_{3},\: h_{6}=l_{1}m_{1}+L_{1}m_{2}+L_{1}m_{3},\: \\h_{7}=l_{2}m_{2}+L_{2}m_{3},\: h_{8}=l_{3}m_{3},\: \\h_{9}=L_{1}l_{2}m_{2}+L_{1}L_{2}m_{3},\: h_{10}=L_{1}l_{3}m_{3},\: \\h_{11}=L_{2}l_{3}m_{3},\: \\d_{1}=(2\dot{\theta}_{1}+\dot{\theta}_{2}+\dot{\theta}_{3})(\dot{\theta}_{2}+\dot{\theta}_{3}),\: \\d_{2}=(2\dot{\theta}_{1}+\dot{\theta}_{2})\dot{\theta}_{2},\: d_{3}=(2\dot{\theta}_{1}+2\dot{\theta}_{2}+\dot{\theta}_{3})\dot{\theta}_{3},\: \\d_{4}=\dot{\theta}_{1}^{2},\: d_{5}=(\dot{\theta}_{1}+\dot{\theta}_{2})^{2}.$

์ƒํƒœ๋ฐฉ์ •์‹ ์œ ๋„๋ฅผ ์œ„ํ•ด ์‹ (1)์€ ์‹ (4)๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ๊ณ 

(4)
$\begin{bmatrix}\ddot{\theta}_{1}\\\ddot{\theta}_{2}\\\ddot{\theta}_{3}\end{bmatrix}=\begin{bmatrix}m_{11}&m_{12}&m_{13}\\\star &m_{22}&m_{23}\\\star &\star&m_{33}\end{bmatrix}^{-1}\left\{\begin{bmatrix}n_{1}\\n_{2}\\n_{3}\end{bmatrix}\ddot{y}+\begin{bmatrix}r_{1}\\r_{2}\\r_{3}\end{bmatrix}\right\}.$

์‹ (4)๋ฅผ ํ†ตํ•ด $\ddot{\theta}_{1}$, $\ddot{\theta}_{2}$, $\ddot{\theta}_{3}$๋ฅผ ์‹ (5)์ฒ˜๋Ÿผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

(5)
\begin{align*}\ddot{\theta}_{1}=b_{11}r_{1}+b_{12}r_{2}+b_{13}r_{3}-(b_{11}n_{1}+b_{12}n_{2}+b_{13}n_{3})\ddot{y}\\\ddot{\theta}_{2}=b_{12}r_{1}+b_{22}r_{2}+b_{23}r_{3}-(b_{12}n_{1}+b_{22}n_{2}+b_{23}n_{3})\ddot{y}\\\ddot{\theta}_{3}=b_{13}r_{1}+b_{23}r_{2}+b_{33}r_{3}-(b_{13}n_{1}+b_{23}n_{2}+b_{33}n_{3})\ddot{y}\end{align*}

์—ฌ๊ธฐ์„œ $b_{ij}$์™€ $\Phi$๋Š” ์‹ (6)๊ณผ ๊ฐ™๋‹ค.

(6)
$b_{11}=\dfrac{m_{22}m_{33}-m_{23}^{2}}{\Phi},\: \\b_{12}=\dfrac{m_{13}m_{23}-m_{12}m_{33}}{\Phi},\: \\b_{13}=\dfrac{m_{12}m_{23}-m_{13}m_{22}}{\Phi},\: \\b_{22}=\dfrac{m_{11}m_{33}-m_{13}^{2}}{\Phi},\: \\b_{23}=\dfrac{m_{12}m_{13}-m_{11}m_{23}}{\Phi},\: \\b_{33}=\dfrac{m_{11}m_{22}-m_{12}^{2}}{\Phi},\: \\\Phi =m_{11}m_{22}m_{33}+2m_{12}m_{13}m_{23}\\-m_{11}m_{23}^{2}-m_{12}^{2}m_{33}-m_{13}^{2}m_{22}. $

์ƒํƒœ๋ฒกํ„ฐ๋ฅผ $x_{1}=y$, $x_{2}=\theta_{1}$, $x_{3}=\theta_{2}$, $x_{4}=\theta_{3}$, $x_{5}=\dot{y}$, $x_{6}=\dot{\theta}_{1}$, $x_{7}=\dot{\theta}_{2}$, $x_{8}=\dot{\theta}_{3}$, $x_{9}=\int_{0}^{t}y(\tau)d\tau$๋กœ ์ •์˜ํ•˜๊ณ  $\ddot{y}$๋ฅผ ๊ฐ€์†๋„ $u$๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋ชจ๋ธ ๋ฐฉ์ •์‹์„ ๋น„์„ ํ˜• ์ƒํƒœ๋ฐฉ์ •์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

(7)
../../Resources/kiee/KIEE.2025.74.8.1363/equ7.png

์ƒํƒœ๋ณ€์ˆ˜์˜ ๋งˆ์ง€๋ง‰ ์š”์†Œ์ธ $\int_{0}^{t}y(\tau)d\tau$๋Š” cart์˜ ์œ„์น˜ ์ •์ƒ์ƒํƒœ ์˜ค์ฐจ๋ฅผ ์—†์• ๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€ํ•œ ํ•ญ์ด๋‹ค. 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋ชจ๋ธ์‹์€ cart์— j์ถ• ๋ฐฉํ–ฅ์˜ ์ˆ˜ํ‰์šด๋™ ์™ธ์˜ ์ˆ˜ํ‰์šด๋™๊ณผ ํšŒ์ „์šด๋™์€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  1๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž๋Š” ๊ฐ hinge์—์„œ i์ถ• ๋ฐฉํ–ฅ์˜ ํšŒ์ „์ถ•์„ ์ค‘์‹ฌ์œผ๋กœ ํ•œ ํšŒ์ „๋งŒ์ด ์กด์žฌํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์‹์€ ์†๋„์— ์„ ํ˜•์ ์ธ ๊ด€๊ณ„๋ฅผ ๊ฐ€์ง€๋Š” ๋งˆ์ฐฐ๋งŒ์„ ๊ณ ๋ คํ•˜๋ฉฐ ๋น„์„ ํ˜•์  ๊ด€๊ณ„๋ฅผ ๊ฐ€์ง€๋Š” ์ •์ง€ ๋งˆ์ฐฐ๊ณผ Coulomb ๋งˆ์ฐฐ์€ ๊ณ ๋ คํ•˜์ง€ ์•Š๋Š”๋‹ค. ์œ ๋„๋œ ๋ชจ๋ธ ๋ฐฉ์ •์‹์„ ์ด์šฉํ•ด Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด๋Ÿฌํ•œ ๊ฐ€์ •์— ์ตœ๋Œ€ํ•œ ๋ถ€ํ•ฉํ•˜๋Š” ๊ธฐ๊ตฌ๋ถ€ ์„ค๊ณ„๊ฐ€ ์ด๋ฃจ์–ด์ ธ์•ผ ํ•œ๋‹ค.

3.2 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ธฐ๊ตฌ๋ถ€ ๋ฐ ๊ตฌ๋™๋ถ€

์‹ค์ œ ์‚ฌ์šฉ๋˜๋Š” ์‹œ์Šคํ…œ์ด reality gap์„ ์ตœ์†Œํ™”ํ•˜๋ ค๋ฉด ์ด๋ก ์ ์œผ๋กœ ์œ ๋„๋œ ๋ชจ๋ธ ๋ฐฉ์ •์‹๊ณผ ๋†’์€ ์ •ํ•ฉ์„ฑ์„ ์œ ์ง€ํ•ด์•ผ ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์œ ๋„๋œ ๊ฐ€์ •์— ๋ถ€ํ•ฉํ•˜๋Š” ๋™์ž‘๋งŒ์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์„ค๊ณ„ํ•˜๋Š” ๊ฒƒ์ด ํ•„์ˆ˜์ ์ด๋‹ค. ๋งŒ์ผ ์‹œ์Šคํ…œ์ด ๊ฐ€์ •๊ณผ ๋‹ค๋ฅธ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•˜๋ฉด, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ๊ณผ ์‹ค๋ฌผ ์‹œ์Šคํ…œ๊ฐ„์˜ ๋™์  ์‘๋‹ต ์ฐจ์ด๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ ๋ชจ๋ธ์˜ ์‹ ๋ขฐ๋„๊ฐ€ ์ €ํ•˜๋  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆํ•˜๋Š” 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ธฐ๊ตฌ๋ถ€ ๋ฐ ๊ตฌ๋™๋ถ€ ์„ค๊ณ„๋Š” ์ด๋ก ์ /์‹คํ—˜์  ๊ธฐ์ค€์— ๋ถ€ํ•ฉํ•˜๋„๋ก ์ •ํ•ฉ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ธฐ๊ตฌ์  ๊ตฌ์กฐ๋Š” ๊ทธ๋ฆผ 2์™€ ๊ฐ™๋‹ค.

๊ทธ๋ฆผ 2. 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์˜ ๊ธฐ๊ตฌ์  ๊ตฌ์กฐ

Fig. 2. The mechanical structure of triple inverted pendulum system

../../Resources/kiee/KIEE.2025.74.8.1363/fig2.png

์ œ์•ˆ๋œ 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์€ ๊ฐ ์ง„์ž ๊ฐ„ ์—ฐ๊ฒฐ ๋ฐฉ์‹์˜ ์ •๋ฐ€๋„๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์„ค๊ณ„๋˜์—ˆ๋‹ค. ๊ทธ๋ฆผ 3์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด, ๊ฐ ์ง„์ž๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” revolute joint๋Š” ๋‹จ์ผ bearing์ด ์•„๋‹Œ ๋ณต๋ ฌ bearing ๊ตฌ์กฐ๋ฅผ ์ ์šฉํ•˜์—ฌ ํšŒ์ „์ด ๋‹จ์ผ ์ถ•์„ ๊ธฐ์ค€์œผ๋กœ ์•ˆ์ •์ ์œผ๋กœ ์ด๋ฃจ์–ด์ง€๋„๋ก ํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ถˆํ•„์š”ํ•œ ๋ฐฉํ–ฅ์˜ ์›€์ง์ž„์„ ์ตœ์†Œํ™”ํ•˜๊ณ  ์ •๋ฐ€ํ•œ ํšŒ์ „ ์„ฑ๋Šฅ์„ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€๋‹ค.

๋˜ํ•œ 3๋‹จ ์ง„์ž์˜ 2๋‹จ ์ง„์ž์— ๋Œ€ํ•œ ํšŒ์ „๊ฐ $\theta_{3}$ ๋ฐ 2๋‹จ ์ง„์ž์˜ 1๋‹จ ์ง„์ž์— ๋Œ€ํ•œ ํšŒ์ „๊ฐ $\theta_{2}$๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด ์†Œํ˜• ์ž๊ธฐ์‹ ์—”์ฝ”๋”๋ฅผ ์žฅ์ฐฉํ•˜์˜€๋‹ค. ํŠนํžˆ $\theta_{3}$๋ฅผ ์ธก์ •ํ•˜๋Š” ์—”์ฝ”๋”๋ฅผ slip ring์— ์—ฐ๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” 1๋‹จ ์ง„์ž์™€ 2๋‹จ ์ง„์ž์˜ ์—ฐ๊ฒฐ ๋ถ€์œ„๋ฅผ ๊ด€ํ†ตํ•ด์•ผ ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ค‘๊ณต์ถ•(hollow shaft) revolute joint๋ฅผ ์‚ฌ์šฉํ•ด ์ง„์ž ๊ฐ„์˜ ๊ฐ„์„ญ์„ ์ค„์ด๊ณ  ํšŒ์ „ ์ •๋ณด๋ฅผ ์›ํ™œํžˆ ๋ฐ›์•„์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„ํ•˜์˜€๋‹ค.

๊ทธ๋ฆผ 4๋Š” ์ด์ „์— ๋ณธ ์—ฐ๊ตฌ์‹ค์—์„œ ์ œ์ž‘ํ•œ 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์˜ rail ๋ฐ cart์˜ ๊ตฌ์กฐ์ด๋‹ค[5]. ํ•ด๋‹น ๊ตฌ์กฐ์—์„œ๋Š” ์ง„์ž์˜ ์šด๋™์— ๋”ฐ๋ผ ์นดํŠธ์— ๋น„ํ‹€๋ฆผ($\alpha$)์ด ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๊ด€์ฐฐ๋˜์—ˆ๋‹ค. ์ด๋Š” ๋ชจ๋ธ ๋ฐฉ์ •์‹์—์„œ ๊ณ ๋ ค๋˜์ง€ ์•Š์€ ์š”์†Œ์ด๋ฉฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ๊ณผ์˜ ์ •ํ•ฉ์„ฑ์„ ์ €ํ•˜์‹œํ‚ค๋Š” ์›์ธ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ทธ๋ฆผ 5์™€ ๊ฐ™์ด ์ด์ค‘ ์ƒคํ”„ํŠธ ๊ฐ€์ด๋“œ ๋ ˆ์ผ ๊ตฌ์กฐ๋ฅผ ์ ์šฉํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๊ตฌ์กฐ๋Š” ๊ธฐ์กด ๊ตฌ์กฐ ๋Œ€๋น„ ๋”์šฑ ๊ฒฌ๊ณ ํ•œ ๊ณ ์ •์„ ์ œ๊ณตํ•˜์—ฌ ์ง„์ž์˜ ์›€์ง์ž„์œผ๋กœ ์ธํ•œ ๋น„ํ‹€๋ฆผ์„ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋ฒจํŠธ์˜ ์žฅ๋ ฅ์ด pulley๋ฅผ ํšŒ์ „์‹œํ‚ค๋Š” ์ถ•์—๋งŒ ์ „๋‹ฌ๋˜๋„๋ก ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋ฆผ 3. ์ œ์•ˆ๋˜๋Š” 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋‹จ๋ฉด๋„ ๋ฐ ์—”์ฝ”๋” ๋ฐฐ์„ 

Fig. 3. Cross-sectional view and encoder wiring of the proposed triple inverted pendulum

../../Resources/kiee/KIEE.2025.74.8.1363/fig3.png

๊ทธ๋ฆผ 4. 2040 ์•Œ๋ฃจ๋ฏธ๋Š„ ํ”„๋กœํŒŒ์ผ์„ ์ด์šฉํ•œ ๋ ˆ์ผ ๋ฐ ์นดํŠธ ๊ตฌ์กฐ

Fig. 4. The structure of the rail and cart constructed using 2040 aluminum profile

../../Resources/kiee/KIEE.2025.74.8.1363/fig4.png

๊ทธ๋ฆผ 5. ์ œ์•ˆ๋˜๋Š” ๊ตฌ๋™๋ถ€ ๊ตฌ์กฐ

Fig. 5. Proposed driving structure

../../Resources/kiee/KIEE.2025.74.8.1363/fig5.png

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ทธ๋ฆผ 5์™€ ๊ฐ™์ด ๊ฐ์†๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ BLDC ๋ชจํ„ฐ๋ฅผ ์ฑ„ํƒํ•˜์—ฌ pulley๋ฅผ ์ง์ ‘ ๊ตฌ๋™ํ•˜๋„๋ก ์„ค๊ณ„ํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ ๋ฐฑ๋ž˜์‹œ๋ฅผ ์ œ๊ฑฐํ•˜์—ฌ limit cycle ํ˜„์ƒ์˜ ๋ฐœ์ƒ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ํšจ๊ณผ๋ฅผ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, BLDC ๋ชจํ„ฐ์—์„œ ๋™๋ ฅ์„ ์ „๋‹ฌํ•˜๋Š” ๋ถ€๋ถ„์— coupling์„ ์‚ฌ์šฉํ•ด ๋ถˆํ•„์š”ํ•œ ๋ถ€ํ•˜๊ฐ€ ์ถœ๋ ฅ์— ์˜ํ–ฅ์„ ์ฃผ๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜์˜€๋‹ค.

์ œ์•ˆ๋˜๋Š” 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์—์„œ๋Š” cart์˜ ์ด๋™๋ถ€, ๊ตฌ๋™๋ถ€, ๊ทธ๋ฆฌ๊ณ  ๊ฐ ์ง„์ž์— bearing์ด ์‚ฌ์šฉ๋˜๋ฉฐ, ๋ชจ๋ธ์—์„œ๋Š” ์†๋„ ๋ฐ ํšŒ์ „๊ฐ์†๋„์— ๋น„๋ก€ํ•˜๋Š” ์ ์„ฑ ๋งˆ์ฐฐ๋งŒ์„ ๊ณ ๋ คํ•˜์˜€๋‹ค. ์ •์ง€ ๋งˆ์ฐฐ์ด๋‚˜ ์ฟจ๋กฑ ๋งˆ์ฐฐ ๋“ฑ์€ ํฌํ•จํ•˜์ง€ ์•Š์œผ๋ฉฐ ์‹ค์ œ๋กœ ์ œ์ž‘๋˜๋Š” ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ ์—ญ์‹œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ๋ง ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•˜๋„๋ก ์„ค๊ณ„๋˜์–ด์•ผ ํ•œ๋‹ค.

๊ณต์žฅ์—์„œ ์ถœ๊ณ ๋œ bearing์€ ์žฅ๊ธฐ๊ฐ„ ์‚ฌ์šฉ์„ ๊ณ ๋ คํ•˜์—ฌ ์ ์„ฑ์ด ๋†’์€ grease๊ฐ€ ๋„ํฌ๋œ ์ƒํƒœ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ bearing์„ ๋ณ„๋„์˜ ์ฒ˜๋ฆฌ ์—†์ด 3๋‹จ ๋„๋ฆฝ์ง„์ž์— ์ ์šฉํ•  ๊ฒฝ์šฐ cart์˜ ์›€์ง์ž„๊ณผ ์ง„์ž ํšŒ์ „ ์‹œ ๋ถˆํ•„์š”ํ•œ ๋งˆ์ฐฐ์ด ๋ฐœ์ƒํ•˜๋ฉฐ ์ ์„ฑ ๋งˆ์ฐฐ ์„ฑ๋ถ„ ์ฆ๊ฐ€๋กœ ์ธํ•ด ์›ํ™œํ•œ ๊ตฌ๋™์„ ๋ฐฉํ•ดํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค.

ํŠนํžˆ revolute joint์— ์‚ฌ์šฉ๋œ bearing์—์„œ ์ •์ง€ ๋งˆ์ฐฐ์ด ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ ๋„๋ฆฝ์ง„์ž๊ฐ€ ์ดˆ๊ธฐ ์ƒํƒœ์—์„œ ์›€์ง์ด๊ธฐ ์–ด๋ ค์›Œ์ง€๋ฉฐ ์˜ˆ๊ธฐ์น˜ ์•Š์€ ์ดˆ๊ธฐ ์ƒํƒœ ํŽธ์ฐจ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์•ˆ์ •ํ•œ ๊ท ํ˜•์ ์—์„œ ์ž‘์€ ํŽธ์ฐจ๊ฐ€ ์ƒ๊ธธ ๊ฒฝ์šฐ, ์‹œ์Šคํ…œ์ด ์ดˆ๊ธฐ ์„ค์ •๊ณผ ๋‹ค๋ฅธ ์ƒํƒœ๋กœ ์ด๋™ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๋Š” limit cycle ํ˜„์ƒ์„ ์œ ๋ฐœํ•˜๋Š” ์ฃผ์š” ์›์ธ ์ค‘ ํ•˜๋‚˜๋กœ ์ž‘์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” solvent๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ bearing์˜ ๊ทธ๋ฆฌ์Šค๋ฅผ ์ œ๊ฑฐํ•œ ํ›„ bearing ๋‚ด๋ถ€๋ฅผ ์˜ค์ผ ์ฒ˜๋ฆฌํ•˜์—ฌ ๋งˆ์ฐฐ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์˜€๋‹ค.

3.3 ์ฒœ์ด ์ œ์–ด

์ฒœ์ด ์ œ์–ด๋Š” ๋‹ค์–‘ํ•œ ๊ท ํ˜•์  ๊ฐ„์˜ ์ฒœ์ด๋ฅผ ๋‹ค๋ฃจ๋ฏ€๋กœ ์‹œ์Šคํ…œ ๋‚ด ๊ท ํ˜•์ ์„ ์ฒด๊ณ„์ ์œผ๋กœ ์ •์˜ํ•˜๊ณ  ์ด๋ฅผ ์ œ์–ด์˜ ๋ชฉํ‘œ ์ƒํƒœ๋กœ ๋ช…ํ™•ํžˆ ์„ค์ •ํ•˜๋Š” ๊ณผ์ •์ด ์„ ํ–‰๋˜์–ด์•ผ ํ•œ๋‹ค. 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ท ํ˜•์ ์€ ๊ฐ ์ง„์ž์˜ angle ๊ฐ’์— ๋”ฐ๋ผ ์ด 8๊ฐ€์ง€๋กœ ๊ตฌ๋ถ„๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ฐ ์ง„์ž์˜ ์ƒํƒœ๋ฅผ Down ๋˜๋Š” Up์œผ๋กœ ํ‘œ๊ธฐํ•˜๋ฉฐ Down์— 0, Up์— 1์„ ๋Œ€์ž…ํ•˜๋ฉด 2์ง„์ˆ˜ ํ˜•์‹์œผ๋กœ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜์—ฌ ๊ท ํ˜•์ ์˜ ์ˆœ์„œ๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์‰ฝ๊ฒŒ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค. ๊ท ํ˜•์ ์€ EP(Equilibrium Point)๋กœ ํ‘œ๊ธฐํ•˜๋ฉฐ ๊ฐ ์ง„์ž์˜ ์กฐํ•ฉ์— ๋”ฐ๋ผ EP0(Down, Down, Down), EP1(Down, Down, Up), EP2(Down, Up, Down), EP3(Down, Up, Up), EP4(Up, Down, Down), EP5(Up, Down, Up), EP6(Up, Up, Down), EP7(Up, Up, Up)๊ณผ ๊ฐ™์ด ๊ตฌ๋ถ„๋œ๋‹ค. ์ด๋Ÿฌํ•œ ์กฐํ•ฉ์€ ๊ทธ๋ฆผ 6์— ์‹œ๊ฐ์ ์œผ๋กœ ์ œ์‹œ๋˜์–ด ์žˆ๋‹ค.

๊ทธ๋ฆผ 6. 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๊ท ํ˜•์ 

Fig. 6. Equilibrium point of triple inverted pendulum

../../Resources/kiee/KIEE.2025.74.8.1363/fig6.png

์ฒœ์ด ์ œ์–ด์™€ ๊ด€๋ จํ•œ ์„ ํ–‰ ์—ฐ๊ตฌ๋Š” ๊ฐ ๊ท ํ˜•์  ๊ฐ„์˜ ์ฒœ์ด ๊ถค์ ์„ ์‚ฌ์ „์— ๊ณ„์‚ฐํ•œ ํ›„ ์ด๋ฅผ ์ถ”์ข…ํ•˜๋Š” ๋ฐฉ์‹์„ ์ ์šฉํ•˜์˜€๋‹ค[6,7]. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ ๊ถค์ ์„ ์ •ํ™•ํžˆ ์ถ”์ข…ํ•  ์ˆ˜ ์žˆ๋Š” ํ™˜๊ฒฝ์—์„œ๋Š” ํšจ๊ณผ์ ์ด์ง€๋งŒ ์™ธ๋ž€์ด ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ์‚ฌ์ „์— ๊ณ„์‚ฐ๋œ ๊ถค์ ์„ ๋”ฐ๋ผ๊ฐ€๊ธฐ ์–ด๋ ค์›Œ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฐ˜๋ฉด Sim-to-Real ๋ฐฉ์‹์€ ์ฒœ์ด ๊ถค์ ์„ ์ง์ ‘ ๊ณ„์‚ฐํ•˜์ง€ ์•Š๊ณ  ๋ชฉํ‘œ ๊ท ํ˜•์ ์„ ๋ณด์ƒ ํ•จ์ˆ˜์˜ ์ตœ๋Œ€๊ฐ’์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ฆ‰ ํŠน์ •ํ•œ ๊ถค์ ์„ ์‚ฌ์ „์— ์ •์˜ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ ๊ท ํ˜•์  ์ž์ฒด๋ฅผ ์ตœ์ข… ๋ชฉํ‘œ ์ƒํƒœ๋กœ ์„ค์ •ํ•จ์œผ๋กœ์จ ์ง„์ž๊ฐ€ ์–ด๋–ค ์ดˆ๊ธฐ ์ƒํƒœ์—์„œ ์ถœ๋ฐœํ•˜๋“  ์ฃผ์–ด์ง„ ๋ชฉํ‘œ ๊ท ํ˜•์ ์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ˆ˜๋ ดํ•˜๋„๋ก ํ•™์Šต๋œ๋‹ค. ํ•ด๋‹น ๋ฐฉ์‹์€ 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ 56๊ฐ€์ง€ ์ฒœ์ด ๊ถค์ ์„ ์ง์ ‘ ๊ตฌํ•  ํ•„์š” ์—†์ด 8๊ฐ€์ง€ ๊ท ํ˜•์ ์— ๋Œ€ํ•œ ํ•™์Šต๋งŒ์œผ๋กœ๋„ ์ฒœ์ด ์ œ์–ด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ Sim-to-Real ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์ดˆ๊ธฐ ์กฐ๊ฑด๊ณผ ํ™˜๊ฒฝ ๋ณ€ํ™”์—๋„ ๊ฐ•์ธํ•œ ์ œ์–ด ์„ฑ๋Šฅ์„ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๊ณ  ์ฒœ์ด ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ์™ธ๋ž€์ด๋‚˜ ๋ชจ๋ธ ๋ถˆํ™•์‹ค์„ฑ์„ ๋ณด๋‹ค ํšจ๊ณผ์ ์œผ๋กœ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๋‹ค.

4. ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ

4.1 ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ์„ค์ •

๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๊ฐ€ ํ•™์Šต ๊ณผ์ •์—์„œ ์ง์ ‘ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ํ™˜๊ฒฝ์€ 3์žฅ์—์„œ ์œ ๋„๋œ ์ˆ˜ํ•™์  ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ Python์„ ์ด์šฉํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์œผ๋กœ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์˜ ๋ฌผ๋ฆฌ์  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋ฐ˜์˜ํ•˜์—ฌ ํ™˜๊ฒฝ์„ ๊ตฌ์ถ•ํ–ˆ์œผ๋ฉฐ ํ•ด๋‹น ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ํ‘œ 2์— ์ •๋ฆฌ๋˜์–ด ์žˆ๋‹ค. ๋˜ํ•œ ๋น„์„ ํ˜• ์ƒ๋ฏธ๋ถ„ ๋ฐฉ์ •์‹์˜ ํ•ด๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ode4 Runge-Kutta ๋ฐฉ๋ฒ•์„ solver๋กœ ์ฑ„ํƒํ•˜์˜€๋‹ค.

ํ‘œ 2 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋ฌผ๋ฆฌ์  ํŒŒ๋ผ๋ฏธํ„ฐ

Table 2 Physical parameters of triple inverted pendulum

Parameter

Link

$i=1$ $i=2$ $i=3$

$m_{i}$[kg]

0.2297

0.1345

0.1644

$L_{i}$[m]

0.1645

0.210

0.245

$l_{i}$[m]

0.0819

0.1239

0.1532

$I_{i}$[kgm2]

1.269e-03

9.371e-04

1.744e-03

$c_{i}$[Nms/rad]

1.293e-03

1.626e-06

3.305e-04

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ•™์Šต ํ™˜๊ฒฝ์—์„œ ๊ฐ ์—ํ”ผ์†Œ๋“œ์˜ ๊ธธ์ด๋Š” 10์ดˆ๋กœ ์„ค์ •ํ–ˆ์œผ๋ฉฐ, ODE solver๋Š” 1ms ๊ฐ„๊ฒฉ์œผ๋กœ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ–ˆ๊ณ , ์—์ด์ „ํŠธ๋Š” 10ms๋งˆ๋‹ค ์ƒํƒœ ์ •๋ณด๋ฅผ ๊ด€์ธกํ•˜๋„๋ก ๊ตฌ์„ฑํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ์„ค์ •์„ ํ†ตํ•ด ์—์ด์ „ํŠธ๋Š” ์—ํ”ผ์†Œ๋“œ๋‹น ์ตœ๋Œ€ 1000ํšŒ ๋™์•ˆ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ ์ ์ง„์ ์œผ๋กœ ์ตœ์ ์˜ ํ–‰๋™ ์ •์ฑ…์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์—ํ”ผ์†Œ๋“œ์˜ ์ข…๋ฃŒ ์กฐ๊ฑด์€ timestep์ด 1000์„ ์ดˆ๊ณผํ•˜๋Š” ๊ฒฝ์šฐ ์™ธ์—๋„ ์ถ”๊ฐ€์ ์œผ๋กœ cart์˜ ๋ณ€์œ„ $y$๊ฐ€ 0.48[m]๋ฅผ ์ดˆ๊ณผํ•˜๊ฑฐ๋‚˜ cart์˜ ๊ฐ€์†๋„ $a$๊ฐ€ 2.5[m/sยฒ] ๋ณด๋‹ค ํด ๊ฒฝ์šฐ ์กฐ๊ธฐ ์ข…๋ฃŒ๋˜๋„๋ก ์„ค์ •ํ•˜์˜€๋‹ค. ์ด๋Š” ํ•™์Šต๋œ ์ œ์–ด๊ธฐ๊ฐ€ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์— ์ ์šฉ๋  ๋•Œ cart๊ฐ€ ๋ ˆ์ผ์˜ ํ•œ๊ณ„๋ฅผ ๋ฒ—์–ด๋‚˜๊ฑฐ๋‚˜ ์‹œ์Šคํ…œ์— ์†์ƒ์ด ๊ฐˆ ์ˆ˜ ์žˆ๋Š” ์ƒํ™ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ์‚ฌ์ „์ ์ธ ์•ˆ์ „ ์กฐ์น˜์ด๋‹ค.

4.2 ๋ณด์ƒํ•จ์ˆ˜ ์„ค๊ณ„

๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๋Š” ํ™˜๊ฒฝ๊ณผ ์ง€์†์ ์œผ๋กœ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ ๋งค ์‹œ์ ์—์„œ ์–ป์€ ๋ณด์ƒ ๊ฐ’์„ ๋ฐ”ํƒ•์œผ๋กœ ์ž์‹ ์˜ ํ–‰๋™ ์ •์ฑ…์„ ์ ์ง„์ ์œผ๋กœ ์ตœ์ ํ™”ํ•œ๋‹ค. ์ด๋•Œ ๋ณด์ƒ ๊ฐ’์„ ์‚ฐ์ถœํ•˜๊ธฐ ์œ„ํ•œ ๋ณด์ƒ ํ•จ์ˆ˜๋Š” 3๋‹จ ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์—์„œ ์กด์žฌํ•˜๋Š” 8๊ฐœ์˜ ๊ท ํ˜•์  ์ค‘ ์–ด๋–ค ๊ท ํ˜•์ ์— ๋„๋‹ฌํ•˜๊ธฐ ์œ„ํ•œ ์ฒœ์ด ์ œ์–ด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š”์ง€์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๊ฒŒ ๋œ๋‹ค. ๊ทธ๋ฆผ 7์—์„œ ์ •ํ•ด๋‘” ๊ท ํ˜•์ ์— ๋งž์ถฐ ๋ณด์ƒ์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋Š” target angle์€ ํ‘œ 3๊ณผ ๊ฐ™๋‹ค.

ํ‘œ 3 ๊ท ํ˜•์ ์— ๋”ฐ๋ฅธ ๊ฐ ์ง„์ž์˜ ๋ชฉํ‘œ ๊ฐ๋„

Table 3 Target angle of each pendulum according to the equilibrium point

Equilibrium Point

Target Angle

$\theta_{1}^{*}$ $\theta_{2}^{*}$ $\theta_{3}^{*}$

0

-ฯ€

-ฯ€

-ฯ€

1

-ฯ€

-ฯ€

0

2

-ฯ€

0

-ฯ€

3

-ฯ€

0

0

4

0

-ฯ€

-ฯ€

5

0

-ฯ€

0

6

0

0

-ฯ€

7

0

0

0

๊ฐ ๊ท ํ˜•์ ์—์„œ ์ตœ๋Œ€ ๋ณด์ƒ์ด ๋˜๋„๋ก ์„ค๊ณ„ํ•œ ๋ณด์ƒ ํ•จ์ˆ˜๋Š” ์‹ (8)๊ณผ ๊ฐ™๊ณ  ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๊ทธ๋ฆผ 7๊ณผ ๊ฐ™๋‹ค.

(8)
$R_{u}=\exp(-0.001ยท u^{2}),\: \\R_{y}=\exp(-0.3ยท |y|),\: \\R_{\theta_{1}}=0.5+0.5ยท\cos(\theta_{1}-\theta_{1}^{*}),\: \\R_{\theta_{2}}=0.5+0.5ยท\cos(\theta_{1}+\theta_{2}-\theta_{2}^{*}),\: \\R_{\theta_{3}}=0.5+0.5ยท\cos(\theta_{1}+\theta_{2}+\theta_{3}-\theta_{3}^{*}),\: \\R_{\dot{\theta}_{1}}=\exp(-0.015ยท |\dot{\theta}_{1}|),\: \\R_{\dot{\theta}_{2}}=\exp(-0.009ยท |\dot{\theta}_{1}+\dot{\theta}_{2}|),\: \\R_{\dot{\theta}_{3}}=\exp(-0.005ยท |\dot{\theta}_{1}+\dot{\theta}_{2}+\dot{\theta}_{3}|).$

๊ทธ๋ฆผ 7. ๋ณด์ƒ ํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„

Fig. 7. Reward function graph

../../Resources/kiee/KIEE.2025.74.8.1363/fig7.png

์ตœ์ข…์ ์ธ ๋ณด์ƒ ํ•จ์ˆ˜๋Š” ๋ชจ๋“  ๋ณด์ƒ ๊ฐ’์„ ๊ณฑํ•˜์—ฌ ์ตœ๋Œ“๊ฐ’์ด 1์ด ๋˜๋Š” ํ˜•ํƒœ๋กœ ์‹ (9)์™€ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

(9)
$Reward=R_{u}ยท R_{y}ยท R_{\theta_{1}}ยท R_{\theta_{2}}ยท R_{\theta 3}ยท R_{ยท{\theta}_{1}}ยท R_{\dot{\theta}_{2}}ยท R_{\dot{\theta}_{3}}.$

์•ž์„œ ์„ค๊ณ„ํ•œ ๋ชจ๋“  ๋ณด์ƒ ํ•จ์ˆ˜๋Š” [0, 1]์˜ ๊ฐ’์œผ๋กœ ์ •๊ทœํ™”๊ฐ€ ์ด๋ฃจ์–ด์ง„ ํ˜•ํƒœ์ด๋ฉฐ ๊ฐ ์—ํ”ผ์†Œ๋“œ๋Š” ์ตœ๋Œ€ 1000๊ฐœ์˜ timestep์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฏ€๋กœ ๋‹จ์œ„ timestep๋งˆ๋‹ค 1์˜ ๋ณด์ƒ์„ ์–ป๋Š”๋‹ค๊ณ  ๊ฐ€์ •ํ•  ๊ฒฝ์šฐ ํ•˜๋‚˜์˜ ์—ํ”ผ์†Œ๋“œ์—์„œ ํš๋“ํ•  ์ˆ˜ ์žˆ๋Š” ๋ณด์ƒ์˜ ์ตœ๋Œ“๊ฐ’์€ 1000์ด ๋œ๋‹ค.

๋ณธ ์—ฐ๊ตฌ์—์„œ ์‚ฌ์šฉ๋œ ๋ณด์ƒ ํ•จ์ˆ˜๋Š” ๊ท ํ˜•์ ์— ๋Œ€ํ•œ ์˜์กด๋„์— ๋”ฐ๋ผ ๋‘ ๊ฐ€์ง€ ์œ ํ˜•์œผ๋กœ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์œ ํ˜•์€ target angle์— ์ข…์†์ ์ธ ๋ณด์ƒ ํ•จ์ˆ˜๋กœ ์ด๋Š” $R_{\theta_{1}}$, $R_{\theta_{2}}$, $R_{\theta_{3}}$๋กœ ์ •์˜๋œ๋‹ค. ํ•ด๋‹น ๋ณด์ƒ ํ•จ์ˆ˜๋“ค์€ ๋ชฉํ‘œ ๊ท ํ˜•์ ๊ณผ์˜ ์˜ค์ฐจ๊ฐ€ ๊ฐ์†Œํ• ์ˆ˜๋ก ๋ณด์ƒ์ด ์ฆ๊ฐ€ํ•˜๋Š” ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง€๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์—์ด์ „ํŠธ๊ฐ€ ๊ฐ ์ง„์ž๋ฅผ ๊ท ํ˜•์ ์œผ๋กœ ์ˆ˜๋ ด์‹œํ‚ค๋Š” ํ–‰๋™ ์ •์ฑ…์„ ํ•™์Šตํ•˜๋„๋ก ์œ ๋„ํ•œ๋‹ค.

๋‘ ๋ฒˆ์งธ ์œ ํ˜•์€ $R_{u}$, $R_{y}$, $R_{\dot{\theta}_{1}}$, $R_{\dot{\theta}_{2}}$, $R_{\dot{\theta}_{3}}$๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ <$u$, $y$, $\dot{\theta}_{1}$, $\dot{\theta}_{2}$, $\dot{\theta}_{3}$>๋ผ๋Š” ๊ฐ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์งˆ์ˆ˜๋ก ๋ณด์ƒ์ด ์ฆ๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์„ค๊ณ„๋œ๋‹ค. ์ด๋Š” ์—์ด์ „ํŠธ๊ฐ€ ์ œ์–ด ์ž…๋ ฅ์„ ์ตœ์†Œํ™”ํ•˜๊ณ  cart์˜ ์œ„์น˜๋ฅผ ์›์  ๋ถ€๊ทผ์œผ๋กœ ์œ ์ง€ํ•˜๋ฉฐ ์ง„์ž์˜ ๋ถˆํ•„์š”ํ•œ ์›€์ง์ž„์„ ์–ต์ œํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šตํ•˜๋Š”๋ฐ ๋„์›€์„ ์ค€๋‹ค.

4.3 ํ•™์Šต ์ „๋žต

์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์„ ์„ค์ •ํ•œ ํ›„ ๊ฐ ๊ท ํ˜•์ ์— ๋งž์ถฐ target angle์„ ๋ณ€๊ฒฝํ•ด๊ฐ€๋ฉฐ ์ด 8ํšŒ์— ๊ฑธ์ณ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ๋Š” ๊ทธ๋ฆผ 8๊ณผ ๊ฐ™์œผ๋ฉฐ ๋ณด์ƒ๊ฐ’์ด ์•ฝ 700์—์„œ 800์ด๋ผ๋Š” ๊ฐ’์— ๋„๋‹ฌํ•œ ํ›„ ์ผ์ •ํ•œ ์ˆ˜์ค€์„ ์œ ์ง€ํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์˜€๋‹ค. ๋˜ํ•œ ๊ฐ ๊ท ํ˜•์ ๋งˆ๋‹ค ํ•™์Šต์ด ์™„๋ฃŒ๋˜๋Š” ์‹œ์ ์ด ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋‚ฌ์œผ๋ฉฐ ์ด๋Š” ๊ท ํ˜•์  ๊ฐ„์˜ ์ œ์–ด ๋‚œ์ด๋„ ์ฐจ์ด์— ๊ธฐ์ธํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋ถ„์„๋œ๋‹ค. ์ด๋Ÿฌํ•œ ์ฐจ์ด๋Š” ๊ฐ ๊ท ํ˜•์ ์˜ ์•ˆ์ •์„ฑ ๋ฐ ์ œ์–ด ๋‚œ์ด๋„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ณด์ƒ ํ•จ์ˆ˜์˜ ๊ตฌ์กฐ, ํƒ์ƒ‰ ๊ณผ์ •์˜ ์ฐจ์ด ๋“ฑ์— ์˜ํ•ด์„œ๋„ ์˜ํ–ฅ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ํ•™์Šต์ด ์™„๋ฃŒ๋œ ์ดํ›„์—๋„ ๋ณด์ƒ์ด ์ผ์ •ํ•œ ๊ฐ’์œผ๋กœ ์™„์ „ํžˆ ์ˆ˜๋ ดํ•˜์ง€ ์•Š๋Š” ๋ชจ์Šต์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋Š”๋ฐ ์ด๋Š” ์™ธ๋ž€์ด ์กด์žฌํ•˜๋Š” ํ™˜๊ฒฝ์—์„œ๋„ ๊ฐ•๊ฑดํ•œ ์ œ์–ด ์ •์ฑ…์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋œ ํ•™์Šต ์กฐ๊ฑด ๋•Œ๋ฌธ์ด๋‹ค.

๊ทธ๋ฆผ 8. ๊ฐ ๊ท ํ˜•์ ์— ๋Œ€ํ•œ ํ•™์Šต ๊ฒฐ๊ณผ

Fig. 8. Result for learning about each equilibrium point

../../Resources/kiee/KIEE.2025.74.8.1363/fig8.png

๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ๊ฐ€ ๋ณด๋‹ค ๋‹ค์–‘ํ•œ ์ƒํƒœ๋ฅผ ๊ฒฝํ—˜ํ•˜๊ณ  ์ผ๋ฐ˜ํ™”๋œ ์ œ์–ด ์ •์ฑ…์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก, ๊ฐ ์—ํ”ผ์†Œ๋“œ์˜ ์ดˆ๊ธฐ ์ƒํƒœ๋Š” ๋ฌด์ž‘์œ„์„ฑ์„ ๊ฐ€์ง€๋„๋ก ์„ค์ •ํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์˜ ๋น„์„ ํ˜• ์ƒํƒœ๋ฐฉ์ •์‹์„ ๊ตฌ์„ฑํ•˜๋Š” ์ƒํƒœ ๋ณ€์ˆ˜๋“ค์„ ๋‚œ์ˆ˜๋กœ ์ดˆ๊ธฐํ™”ํ•˜์—ฌ ์—์ด์ „ํŠธ๊ฐ€ ๊ด‘๋ฒ”์œ„ํ•œ ์ƒํƒœ ๊ณต๊ฐ„์„ ํƒ์ƒ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€๋‹ค. ๋‹ค๋งŒ ์ดˆ๊ธฐ ์ƒํƒœ ๋ณ€์ˆ˜์˜ ๋‚œ์ˆ˜ ๋ฒ”์œ„๋Š” ์‹ค๋ฌผ ์‹œ์Šคํ…œ์˜ ๋ฌผ๋ฆฌ์  ํ•œ๊ณ„๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์„ค์ •ํ•˜์˜€์œผ๋ฉฐ ๊ทธ ๋ฒ”์œ„๋Š” ์‹ (10)๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

(10)
\begin{align*}y\sim U(-0.3,\: 0.3),\: \dot{y}\sim U(-1.2,\: 1.2),\: \\\theta_{1}\sim U(-\pi ,\: \pi),\: \dot{\theta}_{1}\sim U(-10,\: 10),\: \\\theta_{2}\sim U(-\pi ,\: \pi),\: \dot{\theta}_{2}\sim U(-20,\: 20),\: \\\theta_{3}\sim U(-\pi ,\: \pi),\: \dot{\theta}_{3}\sim U(-30,\: 30). \end{align*}

๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๋‚œ์ˆ˜ ๊ธฐ๋ฐ˜ ์ดˆ๊ธฐํ™” ๊ณผ์ •์—์„œ ์ผ๋ถ€ ์ƒํƒœ ๋ณ€์ˆ˜ ์กฐํ•ฉ์ด ํ˜„์‹ค์ ์ธ ๋ฌผ๋ฆฌ ๋ฒ•์น™์„ ์œ„๋ฐฐํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ดˆ๊ธฐ ์ƒํƒœ์—์„œ ํ•™์Šต์ด ์‹œ์ž‘๋˜๋ฉด ๋ชจ๋ธ ๋ฐฉ์ •์‹์˜ ์—ฐ์‚ฐ ๊ฒฐ๊ณผ ์—ญ์‹œ ๋น„ํ˜„์‹ค์ ์ธ ๊ฐ’์œผ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต ์—์ด์ „ํŠธ์˜ ๊ด€์ ์—์„œ๋Š” ์ด์ „ ํ•™์Šต ๊ณผ์ •์—์„œ ํ•œ ๋ฒˆ๋„ ๊ฒฝํ—˜ํ•˜์ง€ ๋ชปํ–ˆ๋˜ ๋ถˆ๊ทœ์น™ํ•œ ์ƒํƒœ๋ฅผ ์ž…๋ ฅ๋ฐ›๊ฒŒ ๋˜๋ฉฐ ํ•™์Šต๋œ ํ–‰๋™ ์ •์ฑ…๊ณผ ๋ฌด๊ด€ํ•œ ์˜ˆ์ธก ๋ถˆ๊ฐ€๋Šฅํ•œ ํ–‰๋™์„ ์ถœ๋ ฅํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์ฆ๊ฐ€ํ•œ๋‹ค. ์ด๋กœ ์ธํ•ด ์ œ์–ด ์‹œ์Šคํ…œ์˜ ๋™์ž‘์ด ๋น„์ •์ƒ์ ์œผ๋กœ ์ด๋ฃจ์–ด์ง€๊ณ  ์„ค์ •๋œ ์ข…๋ฃŒ ์กฐ๊ฑด์„ ์กฐ๊ธฐ์— ์ถฉ์กฑ์‹œ์ผœ ํ•™์Šต์ด ์กฐ๊ธฐ ์ข…๋ฃŒ๋  ๊ฐ€๋Šฅ์„ฑ์ด ์ฆ๊ฐ€ํ•œ๋‹ค. ์ด์ฒ˜๋Ÿผ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์˜๋ฏธ ์—†๋Š” ์ดˆ๊ธฐ ์ƒํƒœ๊ฐ€ ํŠน์ • ์—ํ”ผ์†Œ๋“œ์—์„œ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ ๋ณด์ƒ ๊ฐ’์˜ ํ‰๊ท ์—๋„ ๋ณ€๋™์„ฑ์ด ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค.

๋ฐ˜๋ฉด ํ•™์Šต๋œ ์ œ์–ด๊ธฐ๋ฅผ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์— ์ ์šฉํ•  ๊ฒฝ์šฐ์—๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š”๋‹ค. ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋Š” ๋ฌผ๋ฆฌ ๋ฒ•์น™์— ์œ„๋ฐฐ๋˜๋Š” ์ƒํƒœ๊ฐ€ ์ž์—ฐ์ ์œผ๋กœ ๋ฐœ์ƒํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์—์ด์ „ํŠธ๊ฐ€ ๋น„ํ˜„์‹ค์ ์ธ ์ƒํƒœ ์ •๋ณด๋ฅผ ๊ด€์ธกํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์‚ฌ๋ผ์ง„๋‹ค. ๋”ฐ๋ผ์„œ ๊ฐ•ํ™”ํ•™์Šต ๊ณผ์ •์—์„œ ํ•™์Šต๋œ ํ–‰๋™ ์ •์ฑ…์ด ์ •์ƒ์ ์ธ ์ƒํƒœ ์ •๋ณด์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์•ˆ์ •์ ์œผ๋กœ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋ณด๋‹ค ์‹ ๋ขฐ์„ฑ ๋†’์€ ์ œ์–ด ์„ฑ๋Šฅ์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค.

4.4 ํ•™์Šต ์ „๋žต

๊ทธ๋ฆผ 9๋Š” 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ์ฒœ์ด ์ œ์–ด ์‹คํ—˜์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” Youtube ์˜์ƒ์„ ์บก์ณํ•œ ๊ทธ๋ฆผ์ด๋ฉฐ ์‹ค์ œ ์˜์ƒ์˜ ์ฃผ์†Œ๋Š” https://youtu.be/vVx3ffGo2mk์™€ ๊ฐ™๋‹ค. (์˜์ƒ ์ œ๋ชฉ : World's first reinforcement learning-based transition control of a triple inverted pendulum, ์ฑ„๋„๋ช… : Embedded Control Lab.)

๊ทธ๋ฆผ 9. ์ฒœ์ด ์ œ์–ด ์‹คํ—˜ ์˜์ƒ

Fig. 9. Experimental image of transition control

../../Resources/kiee/KIEE.2025.74.8.1363/fig9.png

์‹คํ—˜ ๊ฒฐ๊ณผ ๋ชจ๋“  ๊ท ํ˜•์ ์—์„œ ์ œ์–ด๊ฐ€ ์„ฑ๊ณต์ ์œผ๋กœ ์ด๋ฃจ์–ด์กŒ์œผ๋ฉฐ ๊ทธ๋ฆผ 10์€ ๊ทธ์ค‘ ์ผ๋ถ€ ์ฒœ์ด ์ œ์–ด ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ์ œ์‹œํ•œ ๊ฒƒ์ด๋‹ค. ํ•ด๋‹น ๊ทธ๋ž˜ํ”„๋Š” ์•ˆ์ •ํ•œ ๊ท ํ˜•์ ์ธ EP0์—์„œ ์‹œ์ž‘ํ•ด ๊ฐ๊ธฐ ๋‹ค๋ฅธ ๊ท ํ˜•์ ์œผ๋กœ์˜ ์ฒœ์ด ์ œ์–ด ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ฒœ์ด ์ˆœ์„œ๋Š” EP0์„ ์‹œ์ž‘์œผ๋กœ EP4, EP1, EP6, EP2, EP5, EP7 ๊ทธ๋ฆฌ๊ณ  ์ตœ์ข…์ ์œผ๋กœ EP3๋กœ ์ด์–ด์ง„๋‹ค. ๊ทธ๋ž˜ํ”„์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด ์ œ์–ด์˜ ์ฃผ์š” ๋Œ€์ƒ์ธ $\theta_{1}$, $\theta_{2}$, $\theta_{3}$๋Š” ๋ชจ๋“  ๊ท ํ˜•์ ์—์„œ ์•ˆ์ •์ ์œผ๋กœ ๋ชฉํ‘œ ๊ฐ’์— ์ˆ˜๋ ดํ•˜๋Š” ์–‘์ƒ์„ ๋ณด์˜€๋‹ค. ์ด๋Š” ๊ฐ ๊ท ํ˜•์ ์— ๋Œ€ํ•œ ํ•™์Šต์ด ์„ฑ๊ณต์ ์œผ๋กœ ์ด๋ฃจ์–ด์กŒ์Œ์„ ์˜๋ฏธํ•˜๋ฉฐ ๋‚˜์•„๊ฐ€ 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ 56๊ฐ€์ง€ ์ฒœ์ด ์ œ์–ด๋ฅผ ๋ชจ๋‘ ์„ฑ๊ณต์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹คํ—˜์ ์œผ๋กœ ์ž…์ฆํ•œ๋‹ค.

๊ทธ๋ฆผ 10. ์ฒœ์ด ์ œ์–ด ๊ฒฐ๊ณผ

Fig. 10. Result of transition control

../../Resources/kiee/KIEE.2025.74.8.1363/fig10.png

๊ทธ๋Ÿฌ๋‚˜ ์ผ๋ถ€ ๊ท ํ˜•์ ์—์„œ๋Š” ๊ฐ๋„๋ณ„๋กœ ์•ฝ๊ฐ„์˜ ์ง„๋™์ด ๊ด€์ฐฐ๋œ๋‹ค. $\theta_{1}$์€ EP3, EP7, $\theta_{2}$๋Š” EP1, EP3, EP5, EP7, $\theta_{3}$๋Š” EP1, EP5์—์„œ ์ง„๋™์ด ๋ฐœ์ƒํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ํ˜„์ƒ์€ ์„ผ์„œ์˜ ํ•ด์ƒ๋„์— ๋”ฐ๋ฅธ ์–‘์žํ™” ์˜ค์ฐจ ์˜ํ–ฅ์œผ๋กœ ์„ค๋ช…๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ์ธก์ • ์ •ํ™•๋„ ์ €ํ•˜์™€ ์ง์ ‘์ ์ธ ๊ด€๋ จ์ด ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋ถ„์„๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์— ์‚ฌ์šฉ๋œ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์€ ๋ชจ๋ธ ๋ฐฉ์ •์‹๊ณผ์˜ ์ •ํ•ฉ์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ์„ค๊ณ„๋˜์—ˆ์œผ๋‚˜ ์ง„์ž์˜ ๊ฐ๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ์—”์ฝ”๋”์˜ ํ•ด์ƒ๋„๋Š” ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค. ๋„๋ฆฝ์ง„์ž ์‹œ์Šคํ…œ์˜ 1๋‹จ ๋ฐ 2๋‹จ ์ง„์ž๋ถ€์—๋Š” 8192 CPR(Counts Per Revolution), 3๋‹จ ์ง„์ž๋ถ€์—๋Š” 4096 CPR ํ•ด์ƒ๋„์˜ ์—”์ฝ”๋”๊ฐ€ ๋ถ€์ฐฉ๋˜์–ด ์žˆ์œผ๋ฉฐ ๊ฐ์†๋„ ์‚ฐ์ถœ ์‹œ ์–‘์žํ™” ์˜ค์ฐจ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์˜ค์ฐจ๋Š” ์ฒœ์ด ์ œ์–ด ์ค‘ ๊ณ ์† ๊ตฌ๊ฐ„์—์„œ๋Š” ์˜ํ–ฅ์ด ๋ฏธ๋ฏธํ•˜๋‚˜ ๊ท ํ˜•์  ๋„๋‹ฌ ์ดํ›„ ์‹œ์Šคํ…œ์ด ์ €์† ์ƒํƒœ๋กœ ์ „ํ™˜๋  ๊ฒฝ์šฐ ๊ด€์ธก๋œ ์ƒํƒœ ์ •๋ณด์— ๋ณด๋‹ค ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋ฉฐ ์ด๋กœ ์ธํ•ด ์ œ์–ด ์ž…๋ ฅ์ด ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ฏธ์„ธํ•˜๊ฒŒ ๋ณ€๋™๋˜๋ฉฐ ๋ฆฌํ”Œ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.

๋˜ํ•œ ๊ฐ ์ง„์ž์—์„œ ์ง„๋™์ด ๋ฐœ์ƒํ•œ ๊ท ํ˜•์ ๋“ค์˜ ๊ณตํ†ต ํŠน์„ฑ์„ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ $\theta_{1}$์˜ ๊ฒฝ์šฐ 2๋‹จ ๋ฐ 3๋‹จ ์ง„์ž๋ถ€๊ฐ€ ๋ชจ๋‘ ๋„๋ฆฝ๋œ ์ƒํƒœ์—์„œ, $\theta_{2}$๋Š” 3๋‹จ ์ง„์ž๋ถ€๊ฐ€ ๋„๋ฆฝ๋œ ์ƒํƒœ์—์„œ, $\theta_{3}$๋Š” 2๋‹จ ์ง„์ž๋ถ€๊ฐ€ ์•„๋ž˜๋ฅผ ํ–ฅํ•˜๊ณ  3๋‹จ ์ง„์ž๋ถ€๊ฐ€ ๋„๋ฆฝ๋œ ์ƒํƒœ์—์„œ ์ง„๋™์ด ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์˜€๋‹ค. ์„ธ ๊ฒฝ์šฐ ๋ชจ๋‘ ๊ณตํ†ต์ ์œผ๋กœ 3๋‹จ ์ง„์ž๋ถ€๊ฐ€ ๋„๋ฆฝ๋œ ์ƒํƒœ๋ผ๋Š” ์ ์—์„œ ๊ฐ€์žฅ ๋ณต์žกํ•œ ๋ชจ๋ธ ํŠน์„ฑ์„ ๊ฐ€์ง€๋Š” 3๋‹จ ์ง„์ž๋ถ€์˜ ์ œ์–ด ๋ฏผ๊ฐ๋„๊ฐ€ ์ง„๋™ ํ˜„์ƒ์˜ ์ฃผ์š” ์›์ธ์œผ๋กœ ํ•ด์„๋  ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰ 3๋‹จ ์ง„์ž๋ถ€๋Š” ์–‘์žํ™” ์˜ค์ฐจ์— ๋”ฐ๋ฅธ ์ œ์–ด ์ž…๋ ฅ์˜ ์ž‘์€ ๋ณ€๋™์—๋„ ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ˜์‘ํ•˜๋ฉฐ ์ด์— ๋”ฐ๋ฅธ ๋ฐ˜๋ณต์ ์ธ ๋ฆฌํ”Œ์ด ๊ด€์ฐฐ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ์•ˆ์ •ํ™” ์ดํ›„์˜ ๋ฆฌํ”Œ ํ˜„์ƒ์„ ์ €๊ฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ณด๋‹ค ๊ณ ํ•ด์ƒ๋„์˜ ์—”์ฝ”๋”๋ฅผ ์ ์šฉํ•˜์—ฌ ์ •๋ฐ€ํ•œ ๊ฐ๋„ ์„ผ์‹ฑ์„ ์ˆ˜ํ–‰ํ•˜๊ฑฐ๋‚˜ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์†๋„ ์ •๋ณด๋ฅผ ์†Œํ”„ํŠธ์›จ์–ด์ ์œผ๋กœ ๋ณด์ •ํ•˜๋Š” ๋ฐฉ์‹์ด ํšจ๊ณผ์ ์ผ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.

5. ๊ฒฐ ๋ก 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Sim-to-Real ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์ง์„ ํ˜• 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ 56๊ฐ€์ง€ ์ฒœ์ด ์ œ์–ด๋ฅผ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ฌผ๋ฆฌ์  ์ •ํ•ฉ์„ฑ์ด ์šฐ์ˆ˜ํ•œ ๊ธฐ๊ตฌ๋ถ€์™€ ์ œ์–ด ํ™˜๊ฒฝ์„ ์„ค๊ณ„ํ•˜์—ฌ reality gap์„ ์ตœ์†Œํ™”ํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ์ œ์–ด๊ธฐ๋Š” ๋ชฉํ‘œ ๊ท ํ˜•์ ์—์„œ์˜ ๋ณด์ƒ์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋„๋ก ๋ณด์ƒ ํ•จ์ˆ˜๋ฅผ ์„ค์ •ํ•˜๊ณ  ๊ฐ ๊ท ํ˜•์ ์— ๋Œ€ํ•œ ๊ฐœ๋ณ„ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜์—ฌ, ๋‹ค์–‘ํ•œ ์ดˆ๊ธฐ ์กฐ๊ฑด ๋ฐ ์™ธ๋ž€ ์ƒํ™ฉ์—์„œ๋„ ๊ฐ•์ธํ•˜๊ฒŒ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค๋ฌผ ์‹œ์Šคํ…œ์„ ํ†ตํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ, 3๋‹จ ๋„๋ฆฝ์ง„์ž์˜ ๋ชจ๋“  ์ฒœ์ด ์ œ์–ด์—์„œ ์•ˆ์ •์ ์ธ ์ˆ˜๋ ด์„ฑ๊ณผ ๋†’์€ ์ •ํ•ฉ์„ฑ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” Sim-to-Real ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•ด ๊ธฐ์กด ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•˜๊ธฐ ์–ด๋ ค์› ๋˜ ๋ณต์žกํ•œ ์ฒœ์ด ์ œ์–ด ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹ค์ฆํ•˜์˜€์œผ๋ฉฐ ํ–ฅํ›„ ๋‹ค์–‘ํ•œ ๋น„์„ ํ˜• ์‹œ์Šคํ…œ์˜ ์‹ค์šฉ์  ์ œ์–ด๊ธฐ๋กœ์„œ์˜ ํ™•์žฅ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•˜์˜€๋‹ค.

Acknowledgements

์ด ์„ฑ๊ณผ๋Š” ์ •๋ถ€(๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€)์˜ ์žฌ์›์œผ๋กœ ํ•œ๊ตญ์—ฐ๊ตฌ์žฌ๋‹จ์˜ ์ง€์›์„ ๋ฐ›์•„ ์ˆ˜ํ–‰๋œ ์—ฐ๊ตฌ์ž„ (RS-2024-00347193).

References

1 
Y. Otani, T. Kurokami, A. Inoue and Y. Hirashima, โ€œA swingup control of an inverted pendulum with cart position control,โ€ IFAC Proceedings Volumes, vol. 34, no. 22, pp. 13-22, 2001. DOI:10.1016/S1474-6670(17)32971-3DOI
2 
H. Li, Z. Nie, E. Zhu, W. He and Y. Zheng, โ€œDouble Loop DR-PID Control of A Rotary Inverted Pendulum,โ€ 2021 IEEEใ€€International Conference on Networking, Sensing and Control(ICNSC), pp. 1-5, 2021. DOI:10.1109/ICNSC52481.2021.9702192DOI
3 
A. Dev, K. R. Chowdhury and M. P. Schoen, โ€œQ-Learning Based Control for Swing-Up and Balancing of Inverted Pendulum,โ€ 2024 Intermountain Engineering, Technology and Computing(IETC), pp. 209-214, 2024. DOI:10.1109/IETC61393.2024.10564347DOI
4 
T. Glรผck, A. Eder and A. Kugi, โ€œSwing-up control of a triple pendulum on a cart with experimental validation,โ€ Automatica, vol. 49, no. 3, pp. 801-808, 2013. DOI:10.1016/j.automatica.2012.12.006DOI
5 
C. Choi, D. Ju, J. Jeong and Y. S. Lee, โ€œStructural Proposition for a Triple Inverted Pendulum and Implementation of Swing-up Control Using an LW-RCP02,โ€ Journal of Institute of Control, Robotics and Systems (in Koreans), vol. 28, no. 10, pp. 916-925, 2022. DOI:10.5302/J.ICROS.2022.22.0176DOI
6 
J. Jeong, D. Ju, Y. Fujiyama and Y. S. Lee, โ€œTransition Control of a Double Inverted Pendulum Using an LW-RCP,โ€ Journal of Institute of Control, Robotics and Systems (in Koreans), vol. 29, no. 9, pp. 694-703, 2023. DOI:10.5302/J.ICROS.2023.23.0100DOI
7 
D. Ju, T. Lee and Y. S. Lee, โ€œTransition Control of a Rotary Double Inverted Pendulum Using an Direct Collocation,โ€ Mathematics, vol. 13, no. 4, Art. no. 640, 2025. DOI:10.3390/math13040640DOI
8 
L. R. E. Shead, K. R. Muske and J. A. Rossiter, โ€œConditions for which MPC fails to converge to the correct target,โ€ IFAC Proceedings Volumes, vol. 41, no. 2, pp. 6968-6973, 2008. DOI:10.3182/20080706-5-KR-1001.01181DOI
9 
W. Zhu, X. Guo, D. Owaki, K. Kutsuzawa and M. Hayashibe, โ€œA Survey of Sim-to-Real Transfer Techniques Applied to Reinforcement Learning for Bioinspired Robots,โ€ IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 7, pp. 3444-3459, 2023. DOI:10.1109/TNNLS.2021.3112718DOI
10 
E. Salvato, G. Fenu, E. Medvet and F. A. Pellegrino, โ€œCrossing the Reality Gap: A Survey on Sim-to-Real Transferability of Robot Controllers in Reinforcement Learning,โ€ IEEE Access, vol. 9, pp. 153171-153187, 2021. DOI:10.1109/ACCESS.2021.3126658DOI
11 
B. Qin, Y. Gao and Y. Bai, โ€œSim-to-real: Six-legged Robot Control with Deep Reinforcement Learning and Curriculum Learning,โ€ 2019 4th International Conference on Robotics and Automation Engineering(ICRAE), pp. 1-5, 2019. DOI:10.1109/ICRAE48301.2019.9043822DOI
12 
G. Fang, Y. Tian, Z. Yang, J. M. P. Geraedts and C. C. L. Wang, โ€œEfficient Jacobian-Based Inverse Kinematics With Sim-to-Real Transfer of Soft Robots by Learning,โ€ IEEE/ASME Transactions on Mechatronics, vol. 27, no. 6, pp. 5296-5306, 2022. DOI:10.1109/TMECH.2022.3178303DOI
13 
M. Ranaweera and Q.H Mahmoud, โ€œBridging the Reality Gap Between Virtual and Physical Environments Through Reinforcement Learning,โ€ IEEE Access, vol. 11, pp. 19914-19927, 2023. DOI:10.1109/ACCESS.2023.3249572DOI
14 
A. Pitkevich and I. Makarov, โ€œA Survey on Sim-to-Real Transfer Methods for Robotic Manipulation,โ€ 2024 IEEE 22nd Jubilee International Symposium on Intelligent Systems and Informatics(SISY), pp. 259-266, 2024. DOI:10.1109/SISY62279.2024.10737545DOI
15 
A. Kuznetsov, P. Shvechikov, A. Grishin and D. Vetrov, โ€œControlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics,โ€ arXiv preprint arXiv:2005.04269, 2020. DOI:10.48550/arXiv.2005.04269DOI

์ €์ž์†Œ๊ฐœ

์ž„์ฐฝ์„(Changseok Lim)
../../Resources/kiee/KIEE.2025.74.8.1363/au1.png

He received B.S. degree in electrical engineering from Inha university in 2024. He is now a M.S. candidate in electrical and computer engineering at Inha university. His research interests include optimal control, reinforcement learning and embedded systems.

์ฃผ๋„์œค(Doyoon Ju)
../../Resources/kiee/KIEE.2025.74.8.1363/au2.png

He received M.S. degree in electrical and computer engineering from Inha university in 2023. He is now a Ph.D. candidate in electrical and computer engineering at Inha university. His research interests include optimal control, embedded systems and reinforcement learning.

์ด์˜์‚ผ(Young Sam Lee)
../../Resources/kiee/KIEE.2025.74.8.1363/au3.png

He received B.S. and M.S. degrees in electrical engineering from Inha University, Incheon, South Korea, in 1999, and the Ph.D. degree in electrical engineering from Seoul National University, South Korea, in 2003. From 2003 to 2004, he was a Senior Researcher with Samsung Electronics Co. Since 2004, he has been with the Department of Electrical and Computer Engineering, Inha University. He is the author of four books and more than 60 articles. His research interests include computer- aided control system designs, rapid control prototyping, control and instrumentation, robot engineering, and embedded systems.