728x90
๋ฐ˜์‘ํ˜•

label smoothing 2

[ ํฌ์Šคํ… ์ธ๊ณต์ง€๋Šฅ์—ฐ๊ตฌ์› ์—ฐ๊ตฌ์ธํ„ด ]Attention Is all You Need ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ์™€ ์„ค๋ช…

2017๋…„ ๊ฒจ์šธ์— ๋‚˜์˜จ ๋…ผ๋ฌธ์œผ๋กœ NIPS์— ๋“ฑ์žฌ๋œ, ๊ธฐ๊ณ„๋ฒˆ์—ญ์„ ๊ณต๋ถ€ํ•œ๋‹ค๋ฉด ๊ณต๋ถ€ํ–ˆ์„ ๋…ผ๋ฌธ์ด๋‹ค. ํ•™๋ถ€์ƒ ์ธํ„ด ๋•Œ๋„ ๊ณต๋ถ€ํ•˜๋ ค๋‹ค๊ฐ€ ์–ด์˜๋ถ€์˜ ๋„˜์–ด๊ฐ”๋˜ ๋…ผ๋ฌธ,, ๋“œ๋””์–ด ๊ฐ ์žก๊ณ  ๊ณต๋ถ€ํ•˜์—ฌ ์ •๋ฆฌํ–ˆ๋‹ค. ์ž…๋ ฅ ๋ฌธ์žฅ์„ ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๋กœ ์••์ถ•ํ•˜๋Š” ๊ณผ์ •์—†์ด, RNN๊ณผ CNN ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜์ง€๋„ ์•Š๊ณ  ์˜ค์ง Attention ๊ธฐ๋ฒ•์„ ์ ์šฉํ•œ Encoder, Decoder๋ฅผ ๋ฐ˜๋ณตํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์ด๊ณ , ์„ฑ๋Šฅ ์—ญ์‹œ ๊ฐœ์„ ์‹œํ‚จ Transformer์— ๋Œ€ํ•œ ๋…ผ๋ฌธ์ด๋‹ค. ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋Š” ์œ„ ์‚ฌ์ง„๊ณผ ๊ฐ™๋‹ค. ๊ฐ€์žฅ ๋จผ์ € ์ขŒ์ธก๊ณผ ์šฐ์ธก์— ๊ฐ๊ฐ N๋ฒˆ์”ฉ ๋ฐ˜๋ณต๋˜๋Š” ์ธ์ฝ”๋”์™€ ๋””์ฝ”๋” ๊ตฌ์กฐ๊ฐ€ ๋ˆˆ์— ๋ˆ๋‹ค. ๊ทธ๋ฆผ์—์„œ๋„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด Transformer ๊ตฌ์กฐ์˜ ํ•ต์‹ฌ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •๋ฆฌ๋œ๋‹ค. Positional Encoding Encoder Self-Att..

[๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech 3๊ธฐ Pre-Course] ์ธ๊ณต์ง€๋Šฅ ๋ณธ๊ฒฉ ํƒ๊ตฌ : Optimization, Gradient Descent, Overfitting, Generalization, Cross-Validation, SGD

๋ชจ๋“  ๊ฒŒ์‹œ๋ฌผ์€ macOS Monterey 12.0.1 ๋ฒ„์ „ ๊ธฐ์ค€์œผ๋กœ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ถ€์ŠคํŠธ์บ ํ”„ AI Tech 3๊ธฐ๋ฅผ ์œ„ํ•œ Pre-Course ๋ฅผ ํ† ๋Œ€๋กœ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค. Introduction 1. Generalization Iteration์ด ๋ฐ˜๋ณต๋  ๋•Œ๋งˆ๋‹ค Training Error๋Š” ๋‹น์—ฐํžˆ ์ค„์–ด๋“ค ๊ฒƒ์ด๋‹ค. ํ•˜์ง€๋งŒ ์‹œ๊ฐ„์ด ์ง€๋‚˜๊ฐ€๊ณ  ๋‚˜๋ฉด Test Error์— ๋Œ€ํ•ด ๊ทธ ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๊ฒŒ ๋œ๋‹ค. ์ฆ‰, Training Error์™€ Test Error ์‚ฌ์ด์˜ ์ฐจ์ด๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ํ˜„์ƒ์„ ์˜ค๋ฅธ์ชฝ ์ด๋ฏธ์ง€์™€ ๊ฐ™์ด Overfitting์ด๋ผ ํ•œ๋‹ค. ํ•™์Šต ๋ฐ์ดํ„ฐ์กฐ์ฐจ ๋งž์ถ”์ง€ ๋ชปํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ด Underfitting. 2. Cross-Validation ์ผ๋ฐ˜์ ์œผ๋กœ ํ•™์Šต๊ณผ ํ…Œ์ŠคํŠธ๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ ์„œ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค...

728x90
๋ฐ˜์‘ํ˜•