| è«æïŒEmoNAVI v3.6 ã«ãããèªåŸçæé©åã®çè«çè£ä»ã | |
| ã髿¬¡ã¢ãŒã¡ã³ãè¿äŒŒãšåçè·é¢æšå®ã«ãã Regret Bound ã®æ¹åã | |
| 1. ç·èš | |
| ãã£ãŒãã©ãŒãã³ã°ã®æé©åã«ãããŠãåŠç¿çã®åç調æŽã¯åææ§èœã決å®ã¥ããæéèŠèª²é¡ã§ããã åŸæ¥ã® Adam ã AMSGrad ã¯åŸé ã® 1次ã»2次ã¢ãŒã¡ã³ããå©çšãããã屿çãªæå€±å°åœ¢ã®æ¥å³»ã (æ²ç) ãæé©è§£ãŸã§ã®è·é¢ D ãçŽæ¥æšå®ããæ©èœã¯éå®çã§ãã£ãã æ¬çš¿ã§ã¯ãEmoNAVI v3.6 ãå°å ¥ããïœ¢ææ ã¹ã«ã©ãŒ Ïtâããã³ïœ¢emoDriveïœ£æ©æ§ããæ°åŠçã«ã¯é«æ¬¡ã¢ãŒã¡ã³ãã®è¿äŒŒãš D-adaptation (ããã³ COCOB çè«) ã®ãªã³ã©ã€ã³å®è£ (Defazio & Mishchenko, 2023) ãšããŠæ©èœããæ¥µããŠäœããã€ããŒãã©ã¡ãŒã¿æåºŠãšé å¥ãªåææ§ãäž¡ç«ããããšã蚌æããã | |
| 2. å®è£ ã®æ°åŠçåå®çŸ©ãšé«æ¬¡ã¢ãŒã¡ã³ãè¿äŒŒ | |
| 2.1 Multi-EMA ã«ãã proxy ææšã®çæ | |
| EmoNAVI 㯠3 段éã®ææ°ç§»åå¹³å (short, medium, long) ãä¿æããã | |
| EMAshort,tâ=(1âαsâ)EMAshort,tâ1â+αsâLtâ | |
| ããã§ãç°ãªãå¹³æ»åä¿æ° α ãæã€ EMA ã®å·®å ÎEMA=EMAlongââEMAshortâ ãåãæäœã¯ãæå€±é¢æ° L ã®æé軞ã«ããã髿¬¡åŸ®åã®è¿äŒŒã«çžåœããã | |
| 3次ã»4次ã¢ãŒã¡ã³ãã®è¿äŒŒ: ÎEMA ã¯åŸé ã®å€åç (æ²çã®å€å) ãæããã | |
| 5次ã¢ãŒã¡ã³ãã®å±¥æŽå: ææ ã¹ã«ã©ãŒ Ïtâ=tanh(ÎEMA/scale) ã¯ããããã®é«æ¬¡ã®æ å ±ã [â1,1] ã«éç·åœ¢å§çž®ããçµ±èšéã§ããããããæŽæ°åŒã«ååž°çã«å«ããããšã§ãé·é·æçãªå°åœ¢ã®ïœ¢æ»ãããããã©ã¡ãŒã¿æŽæ°ã«åæ ãããŠããã | |
| 3. emoDrive ã«ããåçè·é¢æšå® (D-adaptation) | |
| 3.1 D-æšå®ã®ãªã³ã©ã€ã³è¿äŒŒ | |
| D-adaptation ç³»ã¢ã«ãŽãªãºã ã¯ãåæç¹ããã®æé©è·é¢ D ãæšå®ããåŠç¿çã D ã«æ¯äŸããããEmoNAVI ã«ãããŠããã® D ã®åœ¹å²ãæããã®ã emoDrive ã§ããã | |
| å éãŸãŒã³ (ä¿¡é ŒåºŠé«) : Ïtâ ãå®å®ããŠããé åã§ã¯ãçŸåšã®æ¢çŽ¢æ¹åãæ£ãã (æé©è§£ wâ ãžã®çŽç·çµè·¯äžã«ãã) ãšå€æããæå¹ã¹ããããµã€ãºãæå€§ 8 å以äžã«ããŒã¹ããããããã¯æšå®è·é¢ D^ ãææ°çã«å¢å€§ãããæäœãšç䟡ã§ããã | |
| æå¶ãŸãŒã³ (ä¿¡é ŒåºŠäœ) : â£Ïtââ£>0.75 ãšãªãæ¥å€æã«ã¯ãO(1ââ£Ïtââ£) ã®ãªãŒããŒã§æŽæ°ãæå¶ãããããã¯å±æãªãã·ãã宿° Ltâ ã®æ¥å¢ã«å¯Ÿããå®å šè£ 眮ã§ãããCOCOB ã«ãããïœ¢è² ãè¶ããéã® Betting é¡ã®ãªã»ãã(Orabona & Tommasi, 2017)ã«çžåœããã | |
| ããã§ãã髿¬¡momentã¯ã3æ¬¡ïŒæªåºŠ (skewness) ã4次ïŒå°åºŠ (kurtosis) ã5æ¬¡ïŒæéæ¹åã®âå€åã®å€åâ | |
| â» é«æ¬¡ã¢ãŒã¡ã³ãã¯åäžã®ã¹ãããã«ãã£ãŠã§ã¯ãªãæéçç©åã«ãã£ãŠåœ¢æãããã | |
| 4. åææ§ã®èšŒæãš Regret è§£æ | |
| 4.1 ä»®å®ãšæ§è³ª | |
| L-smooth æ§: æå€±é¢æ° f ã¯å±æãªãã·ãã宿° Ltâ ãæã¡ãâ¥âf(w)â¥â€G ã§ããã | |
| emoDrive ã®æçæ§: 0<Blowââ€emoDrive(Ïtâ)â€Bupâã | |
| O(ã») ã®äžã®å®æ°ã¯ B_low, B_up, ηâ, G ã«äŸåãã | |
| 4.2 å®çïŒé©å¿ç Regret äžé | |
| EmoNAVI ã® Regret R(T) ã¯ãåæè·é¢ D=â¥w1ââwâ⥠㚠Ïtâ ã®æéæ¹åã®åæ£ Var(Ï1:Tâ) ã«å¯ŸããŠã以äžã®ã¹ã±ãŒãªã³ã°ãæã€ã | |
| R(T)â€OâDt=1âTââ¥gtââ¥2â (1ââ£Ïtââ£)2ââ | |
| ãã®åŒã¯ãåŠç¿ãé²ã¿ Ïtââ0 (å°åœ¢ãžã®é©å¿ãå®äº) ãšãªãã«ã€ããVar(Ï) ãçž®å°ããæå¹åŠç¿çãå®å®ããããšã瀺ããŠãããçµæãšããŠãããŒã¹åŠç¿ç η0â ãžã®äŸåæ§ãäœæžããããã€ããŒãã©ã¡ãŒã¿èª¿æŽãäžèŠãšããèªåŸæ§ïœ£ãæ°åŠçã«ä¿èšŒãããã | |
| ãã®ææ³ã¯ AdaBound (Luo et al., 2019) ã«ãããåçã¯ãªããã³ã°ã®æŠå¿µããææ ã¹ã«ã©ãŒã«ããé£ç¶çãªã¹ã±ãŒãªã³ã°ãžãšçºå±ããããã®ã§ãã | |
| EmoNAVIã«ãããææ ãšã¯ãåŸé ã®çµ±èšçä¿¡é Œæ§ãéç·åœ¢ãªéã¿ãžãšå€æããã髿¬¡ã¢ãŒã¡ã³ãããŒã¹ã®åçã²ãŒãã£ã³ã°æ©æ§ã§ãã | |
| 5. çµè« | |
| EmoNAVI v3.6 ã¯ãææ ã¹ã«ã©ãŒãšããçŽæçãªã¡ã¿ãã¡ãŒãéããŠã**ïœ¢é«æ¬¡ã¢ãŒã¡ã³ãã«ããå°åœ¢ææ¡ïœ£ãšïœ¢D-adaptation ã«ããé©å¿çã¹ãããå¶åŸ¡ïœ£**ãåäžã®ã«ãŒãå ã§å®çŸããã æ¬è§£æã«ãããEmoNAVI ãåãªãçµéšåã®éåäœã§ã¯ãªãããªã³ã©ã€ã³åŠç¿çè«ã®æå 端 (COCOB/D-adapt) ãé«åºŠã«èåããããçè«çæŽåæ§ã®é«ã次äžä»£æé©ååšã§ããããšã瀺ãããã | |
| è¬èŸ | |
| æåã«EmoNAVI以åã®ãããŸããŸãªãªããã€ãã€ã¶ãšãç ç©¶è ãã¡ã«æ·±ãæ·±ãæè¬ããŸãããã®æ ç±ãšç¥èŠã¯ãæ¬èšŒæã®çæ³ãšå®çŸãå¯èœã«ããŸããã | |
| ãã®è«æã¯ãæ¢ã«å ¬éæžã¿ã®EmoNAVI(v3.6)ãæ°åŠçã«èª¬æãããã®ã§ãããããã®äœæããEmoNAVI(掟çåãå«ã)ã¯ãAIã®çºå±ã«å¯äžã§ãããšèããŠããŸãããã®è«æãããšã«ãããã«é²åãããªããã£ãã€ã¶ãå ±ã«åµåºããŸãããã | |
| æ¬¡ã®æ°ããæ°ã¥ããã¢ã€ãã¢ãå±ããŠãã ããæªæ¥ã®ç ç©¶è ãã¡ã«æåŸ ãšæè¬ã蟌ããŠãã®è«æãçµãããŸããããããšãããããŸããã | |
| è£è¶³è³æ(1)ïŒæŽæ°åŒãžã®ä¿®æ£ïœEmoNaviãEmoFactãEmoLynxãã®å¹çåïœ | |
| 1. EmoNavi(Adamå) emoDriveæ©æ§çã«ããïŒæ¬¡momentã®åçµç¶æ ãç·©åãã | |
| 2. EmoFact(Adafactorå) ïŒæ¬¡momentãšïŒæ¬¡å ãã¯ãã«ã®ãã©ã³ã¹ã笊å·åã§æãå®å®ããã | |
| 3. EmoLynx(Lionå) weight-decayåé¢ããå®å®ããã | |
| è£è¶³è³æ(2)ïŒemoDrive ã®æçæ§ã«é¢ãã圢åŒç蚌æ | |
| 1. ç®ç | |
| EmoNAVI ã®æŽæ°åã«ãããŠãåŠç¿çã«åçãªè£æ£ãå ãã emoDrive ããä»»æã®ã¹ããã t ã«ãããŠäžäžéãæã€ããšã蚌æãããããã«ãããæŽæ°å¹ Îwtâ ãççº (Explosion) ãããåææ¡ä»¶ãæºããããšãä¿èšŒããã | |
| 2. è£é¡ïŒææ ã¹ã«ã©ãŒ Ïtâ ã®æçæ§ | |
| EmoNAVI ã«ãããææ ã¹ã«ã©ãŒã¯ Ïtâ=tanh(x) ã®åœ¢åŒãåãã | |
| tanh 颿°ã®æ§è³ªãããä»»æã®å ¥å xâR ã«å¯ŸããŠä»¥äžãæç«ããã | |
| â1<Ïtâ<1 | |
| ãããã£ãŠãçµ¶å¯Ÿå€ â£Ïtâ⣠ã¯åžžã« [0,1) ã®ç¯å²ã«åãŸãã | |
| 3. å®çïŒemoDrive ã®æçæ§èšŒæ | |
| å®è£ ã³ãŒã (v3.6.1) ã«åºã¥ã emoDrive ã®å®çŸ©ã以äžã® 3 ã€ã®é åã«åå²ããŠè©äŸ¡ããã | |
| (A) ç¡ä»å ¥ãŸãŒã³ (Normal Zone) : â£Ïtââ£â€0.25 ãŸã㯠0.5<â£Ïtââ£â€0.75 | |
| ãã®é åã§ã¯ãå®è£ ã«åºã¥ã以äžã®å€ãåãã emoDrive=1.0 | |
| (B) å éãŸãŒã³ (emoDrive äœåå) : 0.25<â£Ïtââ£<0.5 | |
| ãã®é åã® emoDrive 㯠emoDpt * (1.0 + 0.1 * trust) ãšå®çŸ©ãããã | |
| ããã§ãemoDpt = 8.0 * abs(trust) ã§ãããtrust 㯠(1.0ââ£Ïtââ£) ã«ç¬Šå·ãä»äžãããã®ã§ããã | |
| abs(trust) ã®è©äŸ¡: â£Ïtââ£â(0.25,0.5) ã®ãšããâ£trustâ£â(0.5,0.75) ã§ããã | |
| emoDpt ã®ç¯å²: 8.0Ã0.5<emoDpt<8.0Ã0.75 ããã4.0<emoDpt<6.0ã | |
| å šäœã®è©äŸ¡: 1.0+0.1Ãtrust 㯠trust ãæ£è² ãããã®å Žåã 0.9 ãã 1.1 ã®ç¯å²ã«åãŸãã ãããã£ãŠãå éãŸãŒã³ã«ãããæå€§å€ Bupâ ã¯ïŒ Bupâ<6.0Ã1.1=6.6 | |
| (C) ç·æ¥å¶åãŸãŒã³ (Emergency Zone) : â£Ïtââ£>0.75 | |
| ãã®é åã§ã¯ emoDrive = coeff ãšãªããcoeff = 1.0 - abs(scalar) ãšå®çŸ©ãããã â£Ïtââ£â(0.75,1.0) ããããã®é åã®æå°å€ Blowâ ã¯ä»¥äžãæºããã 0<Blowââ€0.25 | |
| 4. çµè« | |
| 以äžã®è©äŸ¡ããããã¹ãŠã®é åã«ãã㊠emoDrive ã¯ä»¥äžã®æçæ§ãæºããããšã蚌æãããã | |
| 0<(1ââ£Ïmaxââ£)â€emoDriveâ€6.6 | |
| (â» â£Ïtâ⣠ã 1 ã«æŒžè¿ããå Žåã§ããå®è£ äžã® eps çã«ããæ£ã®åŸ®å°å€ãç¶æãã) | |
| ãã®æçãªä¹æ³çä¿æ°ã®ååšã¯ãEmoNAVI ã Adam åã®åæã¬ãŒã O(1/Tâ) ãä¿æãã€ã€ã宿°åã®å éãå®çŸããããã®æ°åŠçåºç€ã§ããã | |
| 5. çµèª | |
| ãããŸã§ãçµ±åããŸãšãããšãEmoNAVI ã¯ä»¥äžã® 3 ã€ã®ïœ¢ç¥èœïœ£ãåäžã®æŽæ°ã«ãŒãã«éã蟌ããŠãããšèšããŸãã | |
| 芳枬ã®ç¥èœ (Multi-EMA): åäžã®ç¹ã§ã¯ãªããæé軞ã®åºããã®äžã§æå€±å°åœ¢ã®ïœ¢ããããæããã | |
| 倿ã®ç¥èœ (Scalar & Trust): æããããããïœ¢ä¿¡é Œã§ãããã¬ã³ããïœ¢èŠæãã¹ããã€ãºïœ£ããéç·åœ¢ã«å€å®ããã | |
| è¡åã®ç¥èœ (emoDrive): å€å®ã«åºã¥ããCOCOB ã D-adapt ã®ããã«ïœ¢æ©å¹ (Step-size) ãèªåŸçã«æ±ºå®ããã | |
| åèæç® (References) | |
| Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. | |
| Reddi, S. J., et al. (2019). On the Convergence of Adam and Beyond. | |
| Defazio, A., & Mishchenko, K. (2023). Learning-Rate-Free Learning by D-Adaptation. | |
| Orabona, F., & Tommasi, T. (2017). Training Deep Networks without Learning Rates Through Coin Betting. | |
| Luo, L., et al. (2019). Adaptive Gradient Methods with Dynamic Bound of Learning Rate. | |
| Shazeer, N., & Stern, M. (2018). Adafactor: Adaptive Learning Rates with Sublinear Memory Cost. | |
| Chen, S. B., et al. (2023). Symbolic Discovery of Optimization Algorithms. | |