728x90
๋ฐ˜์‘ํ˜•

Markov Decision Process 1

[ ๊ฐ•ํ™”ํ•™์Šต ] 3. Finite Markov Decision Processes

์ด ์ฑ…์˜ ๋‚จ์€ ํŒŒํŠธ์—์„œ ์ง€์†์ ์œผ๋กœ ๋‹ค๋ฃฐ ๋ฌธ์ œ๋ฅผ ์†Œ๊ฐœํ•˜๋Š” ์ค‘์š”ํ•œ ์ฑ•ํ„ฐ๋กœ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์šฐ๋ฆฌ๋Š” ๊ฐ•ํ™”ํ•™์Šต์ด๋ผ ์—ฌ๊ธด๋‹ค. ์ด๋ฒˆ ์ฑ•ํ„ฐ๋ฅผ ํ†ตํ•ด ๊ฐ•ํ™”ํ•™์Šต ๋ฌธ์ œ๊ฐ€ ์–ด๋–ค ๊ฒƒ์ธ์ง€ ๊ฐœ๊ด„์ ์œผ๋กœ ์•Œ์•„๋ณด๊ณ  ๊ทธ ์‘์šฉ์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. ๋˜ํ•œ ๊ฐ•ํ™”ํ•™์Šต ๋ฌธ์ œ์˜ ์ˆ˜ํ•™์ ์œผ๋กœ ์ด์ƒ์ ์ธ ํ˜•ํƒœ๋ฅผ ๋‹ค๋ฃจ๊ณ  Bellman equation์ด๋‚˜ Value function๊ณผ ๊ฐ™์ด ๊ฐ•ํ™”ํ•™์Šต ๋ฌธ์ œ์˜ ์ˆ˜ํ•™์  ๊ตฌ์กฐ์˜ ์ค‘์š”ํ•œ ์š”์†Œ๋“ค์— ๋Œ€ํ•ด ํ•™์Šตํ•œ๋‹ค. 3.1. The Agent-Environment Interface ์•ž์„œ ๊ณ„์† ์–ธ๊ธ‰ํ•˜์˜€๋“ฏ ๊ฐ•ํ™”ํ•™์Šต์—์„œ agent๋Š” actions๋ฅผ ์„ ํƒํ•˜๊ณ  environment๋Š” ๊ทธ actions์— ๋ฐ˜์‘ํ•˜์—ฌ agent์—๊ฒŒ ์ƒˆ๋กœ์šด situation์„ ์ œ์‹œํ•˜๋ฉฐ, reward๋ฅผ ๋ฐœ์ƒ์‹œํ‚จ๋‹ค. ๊ทธ๋ฆฌ๊ณ  agent๋Š” ์‹œ๊ฐ„์„ ๊ฑฐ์ณ ๊ทธ rew..

728x90
๋ฐ˜์‘ํ˜•