728x90
๋ฐ˜์‘ํ˜•

์ „์ฒด ๊ธ€ 81

[ ๊ฐ•ํ™”ํ•™์Šต ] 0. Introduction

๊ฐ•ํ™”ํ•™์Šต ( Reinforcement Learning ) ํ™˜๊ฒฝ(Environment)์„ ํƒ์ƒ‰ํ•˜๋Š” ํ•™์Šต์ฃผ์ฒด(Agent)๋Š” ํ˜„์žฌ ์ƒํƒœ(State)๋ฅผ ์ธ์‹ํ•˜์—ฌ ์–ด๋–ค ํ–‰๋™(Action)์„ ์ทจํ•˜๊ณ , ํ™˜๊ฒฝ์œผ๋กœ๋ถ€ํ„ฐ ๋ณด์ƒ(Reward)๋ฅผ ์–ป๋Š”๋‹ค. ๊ฐ•ํ™”ํ•™์Šต์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ Agent๊ฐ€ ์•ž์œผ๋กœ ๋ˆ„์ ๋  Reward๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ผ๋ จ์˜ Actions๋กœ ์ •์˜๋˜๋Š” Policy๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ํ˜„์žฌ ์„ ํƒํ•œ Action์ด ๋ฏธ๋ž˜์˜ ์ˆœ์ฐจ์  Reward์— ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. (Delayed Reward) ์œ„ ์„œ์ ๊ณผ ๋ฐ•์œ ์„ฑ ๊ต์ˆ˜๋‹˜์˜ ์„œ์ ์„ ์ฐธ๊ณ ํ•˜์—ฌ ๊ฐ•ํ™”ํ•™์Šต์— ๋Œ€ํ•œ ์ด๋ก ์ ์ธ ์ดํ•ด๋ฅผ, Python OpenAI Gym ๋ผ์ด๋ธŒ๋Ÿฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ทธ ๊ตฌํ˜„์„ ๋ชฉํ‘œ๋กœ ๊ณต๋ถ€ํ•˜๊ณ  ํ•ด๋‹น ๋‚ด์šฉ์„ ์ •..

728x90
๋ฐ˜์‘ํ˜•