Upload 2 files
Browse files- emo-v386plus-paper(ENG).txt +708 -0
- emo-v386plus-paper(JPN).txt +588 -0
emo-v386plus-paper(ENG).txt
ADDED
|
@@ -0,0 +1,708 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Paper: Improving Time-Series SNR Estimation and Regret Bound in the Autonomous Optimization Algorithm emoPulse and Exploring Second-Moment-Free Updates via âGeometric Orthogonality of Weights and Gradientsâ : And Beyond Flow-Matching
|
| 2 |
+
|
| 3 |
+
â Establishing âEmotion-Drivenâ Learning Rate Control through Dynamic Inspection of Loss Landscapes and Proposing Next-Generation Optimization through Interaction with Loss Landscapes â
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
Abstract
|
| 7 |
+
|
| 8 |
+
Adjusting the learning rate and ensuring generalization performance are central challenges in deep learning optimization. Existing methods relied on precise gradient estimation and were vulnerable to noise in environments with extremely low precision.
|
| 9 |
+
|
| 10 |
+
This paper proposes the autonomous algorithm emoPulse (v3.7 and later), which centers on a multi-faceted analysis of the loss function over time.
|
| 11 |
+
|
| 12 |
+
This method autonomously generates an optimal learning rate based on the signal-to-noise ratio by capturing the âundulationsâ of the loss landscape from a three-stage exponential moving average (Multi-EMA) and utilizing sentiment scalars and a confidence indicator (Trust).
|
| 13 |
+
|
| 14 |
+
Next, we propose the W-Ref Geometry update rule, which focuses on the geometric relationship between weights and gradients.
|
| 15 |
+
|
| 16 |
+
This achieves a âsecond-moment-freeâ update that does not retain the second moment and responds immediately to terrain changes by dynamically controlling inertia based on the orthogonality between weights and gradients.
|
| 17 |
+
|
| 18 |
+
This simultaneously reduces VRAM usage, providing a democratic foundation for multilingual learning in research environments with limited computational resources and for multicultural coexistence.
|
| 19 |
+
|
| 20 |
+
Next, we will discuss the analysis of emoPulse and how it relates to current challenges. This could contribute to the application of Flow-Matching (FM method) to large language models (LLMs).
|
| 21 |
+
|
| 22 |
+
We propose a solution to address some of the challenges that arise when applying the deterministic learning process of the FM method to LLMs, and present a new optimization approach that bridges the two.
|
| 23 |
+
|
| 24 |
+
We anticipate that the FM method will become one of the optimization techniques that naturally bridges the gap to architectures such as RNN/SMM variants, LNN (LiquidAI/MIT), Mamba (CMU Ã Princeton), and Titans (Google).
|
| 25 |
+
|
| 26 |
+
Furthermore, by synthesizing the learning results of optimizers (Sens / Airy / Cats / Tion / Void) belonging to this family and possessing distinct update characteristics, we present a method that integrates local solutions in a âmultiple positioningâ manner to artificially create flat minima.
|
| 27 |
+
|
| 28 |
+
This achieves robust convergence independent of hyperparameter settings, providing a democratic foundation for research environments in developing countries with limited computational resources and for multilingual learning aimed at preserving diverse cultural heritage.
|
| 29 |
+
|
| 30 |
+
Finally, I append my thoughts and predictions regarding Grokking.
|
| 31 |
+
â» Version 3.7 excludes EmoTion, EmoVoid (EmoTion and EmoVoid is newly developed in version 3.8). The only difference between versions 3.7 and 3.8 lies in the dNR_hist of the emoPulse mechanism described later; all other aspects are identical.
|
| 32 |
+
â» Starting with version 3.8.6, this method is referred to as the âresonant contraction methodâ (resonant projection field) (it is not a stochastic gradient descent method). This will be discussed in detail at the end of this paper in the section on 8th-order moments.
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
1. Introduction
|
| 36 |
+
|
| 37 |
+
This paper presents a unified theory for the optimizers EmoSens / EmoAiry / EmoCats / EmoTion / EmoVoid (v3.7 and later).
|
| 38 |
+
|
| 39 |
+
This method centers on the emoPulse mechanism, which autonomously generates learning rates by layering the exponential moving average (EMA) of loss values and extracting âTrustâ from the time-series statistics of the loss function.
|
| 40 |
+
|
| 41 |
+
This represents an advanced fusion of theory and time-series signal processing (SNR estimation), achieving robust convergence independent of hyperparameter settings.
|
| 42 |
+
|
| 43 |
+
The starting point of this research lies in rethinking the âexcessive reliance on precise gradient estimationâ inherent in existing adaptive gradient methods.
|
| 44 |
+
|
| 45 |
+
In environments with extremely low precision and ultra-quantization (e.g., 1-bit/2-bit), gradients contain extremely high noise, significantly reducing reliability.
|
| 46 |
+
|
| 47 |
+
On the other hand, the loss value continues to function as an accurate scalar value indicating the model's âdistance from the correct answer,â even under the influence of quantization.
|
| 48 |
+
|
| 49 |
+
This method treats the gradient as a reference value for direction (intent) and delegates the initiative of learning to the multifaceted analysis of loss, which is an accurate observation value.
|
| 50 |
+
|
| 51 |
+
This approach achieves the replacement of higher-order moment calculations with scalar control and optimization for low-precision and quantized environments through encoded updates.
|
| 52 |
+
|
| 53 |
+
Its most significant feature lies in integrating local solutions from multiple emo-based optimizers with distinct characteristics as âmultiple positioning.â This enables reaching the flat minimumâpreviously requiring lengthy iterative learningâthrough short-term learning and synthesis.
|
| 54 |
+
|
| 55 |
+
This approach achieved the following three outcomes:
|
| 56 |
+
|
| 57 |
+
Dramatic improvement in computational efficiency: Complex calculations of higher-order moments were replaced with scalar control via temporal accumulation of loss, reducing computational load through temporal accumulation approximation.
|
| 58 |
+
|
| 59 |
+
Optimization for low precision and quantization: Matrix decomposition in EmoAiry, complete elimination of second moments in EmoCats, and the original (proprietary) EmoTion, EmoVoid âgeometric orthogonal updateâ and complete second moment elimination enabled large-scale learning in low-resource environments through update encoding.
|
| 60 |
+
|
| 61 |
+
Autonomous Convergence: By introspecting the S/N ratio of the loss landscape, it eliminates the need for manual schedulers and minimizes the user's trial cost.
|
| 62 |
+
|
| 63 |
+
â» Higher-order moment approximation: Aggregation to higher-order statistics in the time series
|
| 64 |
+
|
| 65 |
+
Mathematically, this represents an advanced fusion of D-adaptation theory and time-series signal processing, forming the foundation for realizing âdemocratic AI learningâ that preserves research environments and diverse cultures in developing countries.
|
| 66 |
+
|
| 67 |
+
â» EmoTion, EmoVoid achieves a lightweight structure that does not require 2nd-order moments by not only replacing higher-order moment calculations with scalar control, but also by using the geometric information inherent in the weights themselves as a guideline for updates (detailed in Chapter 6).
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
2. Theoretical Framework: Emotional Circulation
|
| 71 |
+
|
| 72 |
+
This system forms a feedback loop with the loss function L centered at the origin.
|
| 73 |
+
|
| 74 |
+
2.1 Approximation of Higher-Order Moments Using Multi-EMA
|
| 75 |
+
|
| 76 |
+
By utilizing the differences between three-tiered EMAs (short, medium, long), we capture the âchanges in curvature,â âuncertainty in fluctuations,â and âvariability in changesâ within the loss landscape.
|
| 77 |
+
|
| 78 |
+
EMA_t = (1 - α) * EMA_{t-1} + α * L_t
|
| 79 |
+
|
| 80 |
+
The âHigh-order Temporal Differenceâ generated from this difference â Defined as the âEmotional Scalar,â. This emotion scalar sigma_t is a nonlinear statistic that compresses information about higher-order moments (skewness, kurtosis, and variance) into the range [â1,1].
|
| 81 |
+
Multiple EMAs with different time constants accumulate vast historical steps as âhistoryâ in a layered manner.
|
| 82 |
+
By taking this relative time-delay differential, we observe the âdynamic higher-order rate of change in terrain accompanying learning progressionâ â a phenomenon impossible to detect through static terrain analysis.
|
| 83 |
+
By recursively incorporating this into the update formula, the long-term âsmoothnessâ of the terrain is reflected in the parameter updates.
|
| 84 |
+
|
| 85 |
+
â» Note on the Time-Series Formation of Higher-Order Moments:
|
| 86 |
+
|
| 87 |
+
The higher-order moment approximation in this method is not calculated from single-step gradient information but is formed through temporal accumulation.
|
| 88 |
+
|
| 89 |
+
This means it observes not the static curvature of the terrain but the âdynamic rate of change in the terrain as learning progresses.â
|
| 90 |
+
|
| 91 |
+
â» Hierarchical Structure of Higher-Order Moment Approximation:
|
| 92 |
+
|
| 93 |
+
This method effectively approximates higher-order moments from the third (skewness) to the seventh (confidence amplification) order by accumulating loss over time.
|
| 94 |
+
|
| 95 |
+
This is not a static terrain analysis, but rather an attempt to extract the âsystem's confidenceâ as a physical quantity within the dynamic process of learning.
|
| 96 |
+
|
| 97 |
+
The Multi-EMA structure in this method functions as a dynamic temporal approximation of higher-order moments in statistics.
|
| 98 |
+
|
| 99 |
+
Third to Fifth Order Approximation: The differences between Short, Medium, and Long EMAs extract the temporal evolution of higher-order information such as skewness, kurtosis, and fluctuations in the loss distribution.
|
| 100 |
+
|
| 101 |
+
6th-order approximation: The integrated emotion scalar sigma_t and confidence metric trust_t become 6th-order meta-statistics that indicate âlearning phase stabilityâ beyond mere gradient variance.
|
| 102 |
+
|
| 103 |
+
7th-order approximation (dNR): In deriving dNR, squaring the ratio of these 6th-order information components (d_base/noise_base)^2 exponentially amplifies subtle differences in confidence, yielding an extremely sensitive control signal equivalent to a 7th-order moment.
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
2.2 Definition of the trust level metric trust_t
|
| 107 |
+
|
| 108 |
+
Define the core metric trust_t that determines the âqualityâ of updates as follows.
|
| 109 |
+
|
| 110 |
+
trust_t = sgn(sigma_t) * (1.0 - abs(sigma_t))
|
| 111 |
+
|
| 112 |
+
This trust possesses boundedness, never reaching ±1.0 (complete certainty) or 0 (complete despair), ensuring the system always maintains a moderate balance of âroom for explorationâ and âcaution.â
|
| 113 |
+
|
| 114 |
+
This forms the following feedback loop (emotional circulation system) with the loss function L as its origin.
|
| 115 |
+
|
| 116 |
+
Loss â Multi-EMA â Scalar/Trust â emoPulse â Loss
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
3. emoPulse: Learning Rate Generation via Autonomous Pulsation
|
| 120 |
+
|
| 121 |
+
In v3.7 and later, the conventional emoDrive (acceleration mechanism) has been integrated into emoPulse. This represents an evolution based on an approximation of dynamic distance estimation (D-adaptation) using the time-series signal-to-noise ratio (S/N ratio).
|
| 122 |
+
|
| 123 |
+
3.1 Dynamic Estimation of Noise and Distance
|
| 124 |
+
|
| 125 |
+
Track the system's âwanderingâ and âprogressâ using the following two internal variables, N_t and d_t. Here, N_t represents âoscillationâ (instability), and d_t represents âprogressâ (distance).
|
| 126 |
+
|
| 127 |
+
Noise_est (N_t) N_t = (1 - α) * N_{t-1} + α * abs(sigma_t)
|
| 128 |
+
Distance Estimate (d_t) d_t = (1 - α) * d_{t-1} + α * abs(trust_t)
|
| 129 |
+
|
| 130 |
+
3.2 Definition of emoPulse and Autonomous Control / Instantaneous SNR and History Management (dNR_hist)
|
| 131 |
+
|
| 132 |
+
The generation of emoPulse is determined by the âtug-of-warâ (dynamic equilibrium) between instantaneous SNR and temporal SNR. First, calculate the respective bases for instantaneous and temporal SNR.
|
| 133 |
+
|
| 134 |
+
noise_base = abs(sigma_t - trust_t) + ε_s
|
| 135 |
+
d_base = abs(N_t - d_t) + ε_t
|
| 136 |
+
|
| 137 |
+
Using these, the current SNR intensity is defined as follows.
|
| 138 |
+
|
| 139 |
+
dNR_now_val = ( d_base / noise_base )^2
|
| 140 |
+
|
| 141 |
+
Update Rules for dNR_hist:
|
| 142 |
+
|
| 143 |
+
Acceleration conditions:
|
| 144 |
+
if dNR_now_val >= dNR_hist and trust_t >= threshold_high:
|
| 145 |
+
dNR_hist = min( dNR_now_val, dNR_hist * factor_grow )
|
| 146 |
+
|
| 147 |
+
Conditions for deceleration:
|
| 148 |
+
if threshold_low <= trust_t <= threshold_high:
|
| 149 |
+
dNR_hist = dNR_now_val * factor_decay
|
| 150 |
+
|
| 151 |
+
The final learning rate emoPulse is determined as follows.
|
| 152 |
+
|
| 153 |
+
emoPulse_t = clamp( dNR_hist * (emoScope * η_base), η_min, η_max )
|
| 154 |
+
|
| 155 |
+
This design guarantees the following autonomous behaviors:
|
| 156 |
+
|
| 157 |
+
Confidence Region (â£trustâ£>0.5): SNR improves, learning rate accelerates maximally. Rapidly aims for flat minima.
|
| 158 |
+
Hesitation Region (â£trustâ£<0.5): As uncertainty increases, suppressing the learning rate prevents divergence in sharp valleys.
|
| 159 |
+
⻠emoPulse is a scaling factor determined by the user-defined initial learning rate (emoScope) and the system's default sensitivity (η_base).
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
4. emoPulse: Regret Bound and Boundedness Analysis
|
| 163 |
+
|
| 164 |
+
4.1 Convergence and Regret Analysis
|
| 165 |
+
|
| 166 |
+
The cumulative regret R(T) under emoPulse is bounded above as follows, incorporating the dynamically varying learning rate η_t.
|
| 167 |
+
|
| 168 |
+
R(T) <= O( Σ_{t=1}^T [ η_t * ||g_t||^2 * (1 - |Ï_t|)^2 ] )
|
| 169 |
+
|
| 170 |
+
Here, the coefficient (1 - |Ï_t|) quantifies the âtrustâ of the update derived from the consistency of the short-term, medium-term, and long-term EMAs in the loss function.
|
| 171 |
+
A large |Ï_t| indicates that the loss is fluctuating significantly, leading to a determination that the gradient information for that step is unreliable.
|
| 172 |
+
In contrast, a state where |Ï_t| is small indicates that the loss transition is smooth and the reliability of the update direction is high.
|
| 173 |
+
Therefore, the signal strength trust_t = 1 - |Ï_t| serves to adaptively weight the âeffective update amountâ in the Regret Bound, thereby suppressing the accumulation of regret due to uncertain gradients.
|
| 174 |
+
|
| 175 |
+
The emoPulse method presented here is a generalization that approximates the learning rate structure of D-adaptation by Defazio & Mishchenko (2023) using the loss's time-series statistics (d_t, N_t).
|
| 176 |
+
|
| 177 |
+
η_t â D^2 / noise
|
| 178 |
+
|
| 179 |
+
Definition of emoPulse
|
| 180 |
+
|
| 181 |
+
η_t = ( d_t / (N_t + ε) )^2 * η_base
|
| 182 |
+
|
| 183 |
+
This is a direct time-series reconstruction of SNR control based on the distance/noise ratio of D-adaptation.
|
| 184 |
+
|
| 185 |
+
This structure causes the denominator to dominate when the noise component N_t increases, immediately reducing the learning rate η_t.
|
| 186 |
+
This self-adjustment function automatically suppresses excessive updates in unstable areas of loss terrain.
|
| 187 |
+
This theoretically guarantees a âlearning-rate-freeâ property where the algorithm autonomously achieves dynamic stability without requiring external learning rate scheduling.
|
| 188 |
+
|
| 189 |
+
4.2 Proof of Positive Definiteness and Boundedness
|
| 190 |
+
|
| 191 |
+
We prove below that this algorithm prevents learning rate explosion and vanishing at any step t and is bounded.
|
| 192 |
+
|
| 193 |
+
1. Non-zero boundedness of the denominator (momentary doubt: noise_base)
|
| 194 |
+
|
| 195 |
+
The noise_base used as the denominator during emoPulse generation is defined as the deviation between the current emotion scalar sigma_t and the confidence level trust_t, as follows.
|
| 196 |
+
|
| 197 |
+
noise_base = abs(sigma_t - trust_t) + ε_s
|
| 198 |
+
|
| 199 |
+
In the implementation, since |sigma_t| < 1.0 and trust_t is a signed function based on sigma_t, this difference is bounded.
|
| 200 |
+
Furthermore, the safety factor (+0.1) at the end physically prevents the learning rate from exploding (NaN) due to the denominator approaching zero.
|
| 201 |
+
|
| 202 |
+
2. Lower Boundedness of the Numerator (Time Certainty: d_base)
|
| 203 |
+
|
| 204 |
+
The numerator d_base in the generation of emoPulse is defined as the difference between the noise estimate N_t (noise_est) and the distance estimate d_t (d_est) as historical data.
|
| 205 |
+
|
| 206 |
+
d_base = abs(N_t - d_t) + ε_t
|
| 207 |
+
|
| 208 |
+
N_t is guaranteed to be positive definite by max(noise_est, Μ_r), and d_t is updated by the cumulative sum of abs(trust_t), regardless of improvement or deterioration.
|
| 209 |
+
By adding a safety factor (+0.1) to these temporal statistical differences, it is mathematically guaranteed that âeven when history is unstable in an extremely low-precision environment, the minimum step size (lower limit of the numerator) is always ensured.â
|
| 210 |
+
|
| 211 |
+
3. Conclusions on Boundedness and Constraints on emoPulse:
|
| 212 |
+
|
| 213 |
+
The effective learning rate emoPulse_t generated from the ratio of the âinstantaneous basis (denominator)â and âtemporal basis (numerator)â is strictly constrained within the following range based on the safety margin setting of max(min(..., 3e-3), 1e-6) in the final implementation.
|
| 214 |
+
|
| 215 |
+
0 < η_min <= emoPulse_t <= η_upper_bound
|
| 216 |
+
|
| 217 |
+
Here, the lower limit (η_min) represents the minimum âmetabolic rateâ (heartbeat) that the system maintains even under the most uncertain conditions. This prevents learning from stopping (deadlock) and allows for autonomous recovery.
|
| 218 |
+
On the other hand, the upper bound (η_upper_bound) functions as a limiter to prevent the model from diverging even when a sharp increase in the dNR coefficient occurs.
|
| 219 |
+
|
| 220 |
+
Implementation Considerations:
|
| 221 |
+
Stabilization through Initial Value Setting:
|
| 222 |
+
â» In environments with very small datasets or high initial noise, it is recommended to reset the initial values of d_t and N_t until the multi-EMA stabilizes the âhistoryâ (e.g., d-est: 0.2, Noise-est: 0.2).
|
| 223 |
+
This suppresses divergence caused by initial probabilistic noise. Specifically, by initializing Nâ to be equivalent to dâ, the system essentially starts in a âcautious mode.â
|
| 224 |
+
This functions as an organic warm-up phase during critical initial steps, avoiding overly aggressive updates and prioritizing observation of the terrain.
|
| 225 |
+
Maintaining âUpdate Pressureâ Through Initial Value Settings While Ensuring Safety:
|
| 226 |
+
â» In this method, the d_base parameter forming the emoPulse molecule determines the system's âpotential update force.â Setting the initial values to N0 = 1.0 and d0 = 0.02 means intentionally ensuring high acceleration potential from the start of learning.
|
| 227 |
+
Due to the nature of exponential moving averages, the effect of this initial value persists as âhistoryâ for approximately 100 steps. During this period, the system maintains a high acceleration pressure while providing convergence power only to âtruly reliable signalsâ that have passed the strict screening by the emotional mechanism.
|
| 228 |
+
|
| 229 |
+
|
| 230 |
+
5. Polarized Normalization: Adaptation to Low-Precision Environments
|
| 231 |
+
|
| 232 |
+
This chapter describes sign-based normalization for applying the theoretical framework of emoPulse to low-precision environments.
|
| 233 |
+
|
| 234 |
+
To eliminate reliance on precise floating-point calculations and support ultra-low precision environments (ultra-quantization), the following update rules are adopted (EmoAiry, EmoCats, EmoTion.)
|
| 235 |
+
|
| 236 |
+
delta_w_t = -emoPulse_t * sign( m_t / ( sqrt(v_t) + ε ) )
|
| 237 |
+
|
| 238 |
+
This enables EmoAiry to resolve the imbalance in accuracy between one-dimensional vectors and two-dimensional moments, achieving a âunification of willâ that extracts only the consensus on direction.
|
| 239 |
+
â» EmoCats supports encoding based on Lion with WD separation.
|
| 240 |
+
â» EmoTion, EmoVoid encodes a proprietary update method called âGeometric Orthogonal Update.â
|
| 241 |
+
|
| 242 |
+
|
| 243 |
+
6. EmoTion, EmoVoid Explanation of the âNew Optimizationâ Update Formula and Bridging to the Future
|
| 244 |
+
|
| 245 |
+
Respect for Existing Methods and EmoTion, EmoVoid Position:
|
| 246 |
+
The EmoTion update algorithm stems from deep respect for Adam and others, a pinnacle of modern deep learning. The concept of âadaptive learning rateâ demonstrated by Adam and others established the conditions for effective optimization and significantly lowered the barriers to its adoption.
|
| 247 |
+
|
| 248 |
+
EmoTion / EmoVoid inherits this spirit while taking a different approach: using geometry (W-Ref Geometry) and emotion (emoPulse) instead of statistics.
|
| 249 |
+
|
| 250 |
+
A New Form of Precision:
|
| 251 |
+
While Adam and others meticulously carves a path from past statistics, EmoTion / EmoVoid navigates terrain more flexibly through dialogue with current weights (Geometric interaction with current weights) and the pulse of loss. This approach aims for natural convergence that suppresses overfitting while maintaining accuracy on par with Adam and others. (Orthogonality as Freshness)
|
| 252 |
+
|
| 253 |
+
Resource-Friendly Design (Reduced VRAM):
|
| 254 |
+
Computational resources are finite, and not everyone has access to high-performance, abundant resources. By entrusting the precise mechanism of 2nd-order momentsâwhich Adam and others has carefully preservedâto âscalar control,â EmoTion was able to reduce VRAM load by approximately half. EmoVoid achieves minimal VRAM load by eliminating both first and 2nd-order moments and directly reflecting the orthogonality of W and G. We believe this forms the foundation for a âdemocratic learning environmentâ where more people can conduct AI training.
|
| 255 |
+
|
| 256 |
+
Geometric Inertia Control Using W-Ref Geometry:
|
| 257 |
+
The core of both algorithms lies in its geometric update rule based on the orthogonality between the weight vector W and the gradient vector G.
|
| 258 |
+
Whereas conventional statistical methods rely on the accumulated gradient history (shadow), W-Ref Geometry uses the current weight W as the âsubstanceâ and derives the freshness of gradient G from the following cosine similarity Ï(rho).
|
| 259 |
+
|
| 260 |
+
Ï(rho) = | <W, G> | / ( ||W|| * ||G|| + eps )
|
| 261 |
+
|
| 262 |
+
The smaller Ï (rho) is (the closer it is to orthogonal), the more the current gradient is judged to contain âunknown informationâ not present in the existing weight structure. This allows the current gradient to be strongly incorporated, overcoming inertia. This geometric âinformation selectionâ simultaneously achieves high-precision directional changes without statistical delay and a regularization effect by suppressing redundant updates. (Dynamic Inertia Calibration)
|
| 263 |
+
|
| 264 |
+
Reason it holds true based solely on the first moment:
|
| 265 |
+
The absence of 2nd-order moments (variance estimation) is not merely for weight reduction. W-Ref Geometry updates based on the âfreshness of directionâ rather than the âmagnitudeâ of gradients, rendering much of the role traditionally fulfilled by 2nd-order moments unnecessary. (Departure from 2nd-Order Moments)
|
| 266 |
+
Direction selection via W-Ref Geometry determines that gradients G containing unknown information are those most orthogonal to weight W, thereby reducing inertia and steering toward new directions. Conversely, gradients parallel to W are deemed redundant, prioritizing inertia. This selection based on âdirection purityâ is more direct than variance estimation, robust against noise, and suppresses overfitting.
|
| 267 |
+
â» EmoVoid has no first or second moments.
|
| 268 |
+
|
| 269 |
+
|
| 270 |
+
Below is a detailed explanation of the W-Ref Geometry method.
|
| 271 |
+
|
| 272 |
+
1. Definition of the Geometric Index Ï (Orthogonality Index)
|
| 273 |
+
While conventional optimizers adjust the learning rate based on the âmagnitude of the gradientâ (L2 norm) or âstatistical varianceâ (second moment), EmoTion defines the ârelative orientation of the gradient vector G with respect to the current weight vector Wâ as the freshness of information.
|
| 274 |
+
|
| 275 |
+
Ït(rho_t) = | <W_t, G_t> | / ( ||W_t|| * ||G_t|| + eps )
|
| 276 |
+
|
| 277 |
+
Orthogonal state (Ïâ0): The gradient is orthogonal to the current weight structure. This suggests a âcompletely new direction of knowledge that the current model does not yet possess.â
|
| 278 |
+
Parallel state (Ïâ1): The gradient points in the same direction as the current weight (or exactly opposite). This suggests the possibility that it is merely redundant information, equivalent to scaling the current weight.
|
| 279 |
+
|
| 280 |
+
2. Adaptive Inertial Control (Geometric Momentum Blending)
|
| 281 |
+
This update formula dynamically adjusts inertia based on the âfreshnessâ of the gradient. It replaces the conventional variance estimation based on second moments with a structure that utilizes the degree of redundancy in geometric information.
|
| 282 |
+
|
| 283 |
+
m_t = beta1 * m_{t-1} + (1 - beta1) * Freshness_t * G_t
|
| 284 |
+
where Freshness_t = 1.0 - EMA(rho_t)
|
| 285 |
+
|
| 286 |
+
Theoretical Interpretation: When the gradient is âorthogonalâ (fresh), it temporarily weakens inertia (past shadows) and reacts immediately to new information (steers). Conversely, when âparallelâ (redundant), it maintains inertia and prioritizes stability. This can be interpreted as replacing âstatistical uncertaintyâ (variance) with âgeometric redundancy of information.â
|
| 287 |
+
|
| 288 |
+
â» Simplification in EmoVoid: EmoVoid eliminates even this inertial control, directly multiplying Freshness by the update vector. This achieves geometric information selection while completely freeing up the m_t slot in memory.
|
| 289 |
+
|
| 290 |
+
3. Alternative to Update-Based Encoding and L2 Regularization
|
| 291 |
+
The final key to EmoTion, EmoVoid remaining second-moment-free lies in separating sign extraction (Sign) and weight decay. By determining the update direction solely based on sign(m_t), the magnitude of the weight update is no longer influenced by the âsizeâ of the gradient. This enables stable updates that are resilient to fluctuations and noise in the gradient scale.
|
| 292 |
+
|
| 293 |
+
EmoTion Update Rule:
|
| 294 |
+
W_{t+1} = W_t * (1 - emoPulse_t * lambda) - emoPulse_t * sign(m_t)
|
| 295 |
+
(emoPulse is the learning rate derived from dNR, and lambda is the WeightDecay coefficient.)
|
| 296 |
+
|
| 297 |
+
EmoVoid Update Rule:
|
| 298 |
+
W_{t+1} = W_t â emoPulse_t * sign(G_t) * (1âÏ_t)
|
| 299 |
+
(EmoVoid enables stable convergence without explicit lambdas through its self-suppression mechanism.)
|
| 300 |
+
|
| 301 |
+
â» Proposal of âEntity Reference Optimizationâ: While conventional optimization methods track âpast gradientsâ (history), this approach establishing the Weight-Reference (W-Ref) paradigm, which uses correlation with âcurrent weightsâ (entities) as the trigger for updates.
|
| 302 |
+
â» Geometric Interpretation of the Curse of Dimensionality: By leveraging the concentration phenomenon of vectors in high-dimensional space (their tendency to be mutually orthogonal), it detects even slight âdeviationsâ from orthogonality as redundant information. This enables higher-precision, low-latency inertial control without relying on statistical variance estimation. In high-dimensional spaces (e.g., layers with hundreds of millions of parameters), the probability of two vectors coincidentally becoming parallel is extremely low. Since nearly all vectors are orthogonal, any deviation of Ï from zero (approaching parallelism) statistically signifies âextremely strong correlationâ (duplication). This means that without consulting vast historical statistics (second moments), it becomes possible to instantly determine whether an update is valuable based solely on its relationship to the current weights.
|
| 303 |
+
â» Resonance with emoPulse: emoPulse controls the âtemporal axis pulseâ (when and how much to move), while W-Ref Geometry determines the âspatial axis directionâ (where and how much to move). This integrated autonomous control of time and space is the core mechanism enabling both VRAM reduction and high-precision convergence, thereby enhancing learning robustness.
|
| 304 |
+
|
| 305 |
+
4. Implementation Lightweighting via Approximation of W-Ref Geometry
|
| 306 |
+
|
| 307 |
+
Theoretically, W-Ref Geometry rigorously measures the orthogonality between weights and gradients as follows.
|
| 308 |
+
Ït(rho_t) = | <W_t, G_t> | / ( ||W_t|| * ||G_t|| + eps )
|
| 309 |
+
|
| 310 |
+
However, in large models, the sequential computation of the inner product across all layers, the norm across all layers, and the cosine similarity becomes a bottleneck in terms of VRAM and computational load. Therefore, in the implementation, we introduced an approximation formula for W-Ref Geometry. This achieves near-zero VRAM usage while preserving the âessenceâ of W-Ref Geometry.
|
| 311 |
+
|
| 312 |
+
4-1. EmoTion: Estimating âDirectional Noveltyâ Based on L1 Norm Change
|
| 313 |
+
|
| 314 |
+
EmoTion estimates âhow much the model is trying to move in a new directionâ based on the change in the L1 norm of the overall weights.
|
| 315 |
+
g_ratio_t = | L1_t - L1_{t-1} | / ( L1_{t-1} + eps )
|
| 316 |
+
Freshness_t = min( g_ratio_t / freshness_scale , freshness_cap )
|
| 317 |
+
|
| 318 |
+
This Freshness_t is used as the mixing ratio for the first moment (exp_avg), enabling a lightweight implementation of the precise measurement method for W-Ref Geometry, which âstrongly reacts to orthogonal directions while retaining inertia in parallel directions.â
|
| 319 |
+
|
| 320 |
+
4-2. EmoVoid: Approximation via âDirect Scalingâ of Weight Energy
|
| 321 |
+
|
| 322 |
+
EmoVoid does not perform inertial control such as freshness because it possesses neither 1st-order nor 2nd-order moments.
|
| 323 |
+
g_ratio_t = L1_{t-1} / ( L1_t + eps )
|
| 324 |
+
W_t â W_t * g_ratio_t
|
| 325 |
+
|
| 326 |
+
Instead, we approximate the âdirectional purityâ of W-Ref Geometry by directly scaling the L1 norm of the entire weight. Scaling for EmoVoid is performed only during the âwarm-up period and final stabilization phaseâ; outside these periods, scaling is not performed and updates are made solely based on sign(G_t).
|
| 327 |
+
This establishes EmoVoid's unique âgeometric self-suppression,â which prevents the energy of weights from running wild, suppresses bias in the gradient direction, and enables stable convergence even without momentum.
|
| 328 |
+
|
| 329 |
+
4-3. Significance of Approximation Formulas: Approximations are designed not as âcomplete versions of theoryâ but as âimplementation optimizations.â
|
| 330 |
+
|
| 331 |
+
The two differ in how they handle the âtime axisâ (emoPulse) and the âspace axisâ (W-Ref Geometry), but ultimately both achieve âgeometric optimization independent of statistics.â
|
| 332 |
+
EmoTion employs inertial control through Freshness, while EmoVoid utilizes self-suppression via energy correction; both share the core principle of âevaluating directional purityâ at the heart of W-Ref Geometry.
|
| 333 |
+
|
| 334 |
+
5. Requirements for Computing Frameworks (PyTorch, etc.)
|
| 335 |
+
|
| 336 |
+
The W-Ref Geometry and Approx W-Ref proposed in this paper hold the potential to overcome the current memory efficiency limitations in deep learning frameworks. We strongly request that future tensor operation libraries, such as PyTorch, implement the following features.
|
| 337 |
+
|
| 338 |
+
Request: Native implementation of the geometric correlation function torch.geom_relation(W, G) for weights and gradients
|
| 339 |
+
|
| 340 |
+
Currently, calculating the orthogonality (Ï) between weights W and gradients G requires inner product computations, norm calculations for each, and an intermediate tensor to hold these values. This results in non-negligible computational overhead and VRAM pressure.
|
| 341 |
+
|
| 342 |
+
If you directly reference W and G at the C++/CUDA level without generating intermediate tensors,
|
| 343 |
+
|
| 344 |
+
Ït(rho_t) = | <W_t, G_t> | / ( ||W_t|| * ||G_t|| + eps )
|
| 345 |
+
(Orthogonality per individual parameter layer)
|
| 346 |
+
|
| 347 |
+
Implementing a native function that returns this as a scalar value would enable updates based on geometric confidence without retaining the second moment (variance statistic), requiring minimal VRAM.
|
| 348 |
+
|
| 349 |
+
I am convinced this will be the final piece that not only accelerates optimization but also determines the democratization of large-scale model training on edge devices and in resource-constrained environments.
|
| 350 |
+
|
| 351 |
+
|
| 352 |
+
7. Theoretical Connection and Structural Limitations with Flow-Matching Systems
|
| 353 |
+
|
| 354 |
+
The EmoSens generation (Sens / Airy / Cats / Tion / Void) has the following two meanings for Flow-Matching (FM) methods.
|
| 355 |
+
|
| 356 |
+
1: This method is the world's first optimizer to fully adapt to the update structure of Flow-Matching.
|
| 357 |
+
2: Simultaneously, it also points beyond the structural limitations of the Flow-Matching family.
|
| 358 |
+
|
| 359 |
+
1. The structural constraint of ânoise intoleranceâ inherent in Flow-Matching
|
| 360 |
+
Flow-Matching demands high smoothness and consistency in gradient fields to faithfully reproduce continuous-time flow fields. However, this design inherently contains a structural constraint that cannot tolerate noise.
|
| 361 |
+
|
| 362 |
+
- Minor disruptions in gradients directly lead to breakdowns in the flow field
|
| 363 |
+
- In quantized or low-precision environments, gradient reliability rapidly deteriorates
|
| 364 |
+
- Generalizability is compromised due to the absence of noise-tolerant buffer structures
|
| 365 |
+
|
| 366 |
+
In fact, it is known that in FM-based learning, a decrease in SNR directly leads to divergence and failure. This is consistent with the experimental results of SDXL / VAE / vanilla initialization discussed later.
|
| 367 |
+
|
| 368 |
+
2. Reverse Engineering of âAcceptance and Utilization of Noiseâ via emoPulse
|
| 369 |
+
|
| 370 |
+
emoPulse treats noise not as âerror to be eliminatedâ but as a signal indicating learning progress, as it primarily focuses on loss's time-series statistics.
|
| 371 |
+
|
| 372 |
+
- Multi-EMA's higher-order moment approximation actively utilizes fluctuations including noise
|
| 373 |
+
- trust_t is a definition of âconfidence levelâ that assumes the presence of noise
|
| 374 |
+
- emoPulse converts noise into a source for learning rate control through dynamic SNR estimation
|
| 375 |
+
|
| 376 |
+
This structure enables emo-style models to adopt a design philosophy opposite to Flow-Matching: âgaining generalizability while tolerating noise.â
|
| 377 |
+
|
| 378 |
+
3. The paradox that âperfect adaptationâ to flow-matching highlights its limitations
|
| 379 |
+
|
| 380 |
+
The emo-style optimizer, by fully adapting to the update structure of Flow-Matching, most clearly highlights the fundamental weaknesses of the FM-style approach.
|
| 381 |
+
|
| 382 |
+
- The smooth gradient field required by FM is difficult to achieve in actual learning processes
|
| 383 |
+
- Noise intolerance is fatal in low-precision and quantization environments
|
| 384 |
+
- Noise-driven update rules like emoPulse are better suited to real-world learning
|
| 385 |
+
|
| 386 |
+
In particular, experimental results showing that emoPulse overcomes the noise vulnerability inherent in FM systems and completes training without stagnation during SDXL e-pred + ZtSNR learning strongly support this paradox.
|
| 387 |
+
|
| 388 |
+
4. The Limits of Flow-Matching Approaches and the Transition to Next-Generation Optimization
|
| 389 |
+
|
| 390 |
+
Flow-Matching possesses an ideal theoretical framework for reproducing idealized continuous flows, yet it is vulnerable to noise, quantization, nonlinearity, and dynamic changes in higher-order moments inherent in real learning processes.
|
| 391 |
+
|
| 392 |
+
LLMs learn probability distributions through autoregression, thus presupposing an SDE-based worldview, whereas Flow-Matching requires deterministic ODEs, leading to a fundamental conflict between these premises.
|
| 393 |
+
|
| 394 |
+
emoPulse not only bridges this gap but also introduces a novel optimization technique called the âemotional circulation systemâ that actively utilizes noise. By dynamically absorbing fluctuations in autoregressive entropy, emoPulse enables FM-like smooth learning even in large language models.
|
| 395 |
+
|
| 396 |
+
- Full-layer LoRA for SDXL
|
| 397 |
+
- Full-layer retraining for VAE
|
| 398 |
+
- Ultra-fast learning with a single image
|
| 399 |
+
- Stable learning with vanilla initialized models
|
| 400 |
+
|
| 401 |
+
These experimental results (supplementary materials) demonstrate that emoPulse exhibits stability in areas where Flow-Matching struggles. This structure is not a successor to Flow-Matching, but rather a next-generation optimization foundation that overcomes the very premise of Flow-Matching itself.
|
| 402 |
+
|
| 403 |
+
5. The SDE-DDE-ODE Contraction Hierarchy in emoPulse
|
| 404 |
+
|
| 405 |
+
The history term in the Multi-EMA model decays exponentially, causing the delay term to effectively vanish within a finite time. Consequently, the solution trajectory of the DDE naturally connects to a smooth approximation of the ODE.
|
| 406 |
+
|
| 407 |
+
- SDE-like fluctuations: Instantaneous variations in sigma_t and trust_t
|
| 408 |
+
- DDE-like delays: History dependence in Multi-EMA, dNR_hist, N_t, and d_t
|
| 409 |
+
- ODE-like smoothness: âSmooth terrain approximationâ via time integration of the loss function
|
| 410 |
+
|
| 411 |
+
In other words, emoPulse inherently possesses a Three-tier hierarchy of condensation: âreducing from SDE to DDE and then to ODEâ
|
| 412 |
+
|
| 413 |
+
- FM concept of âcontinuous flowâ is absorbed by emoPulse
|
| 414 |
+
- FM âintolerance of noiseâ is overcome by emoPulse
|
| 415 |
+
- FM ârigor of SDEâ becomes unnecessary
|
| 416 |
+
|
| 417 |
+
emoPulse integrates âSDE fluctuations â DDE delays â ODE smoothnessâ into a single update rule. This Three-tier hierarchy naturally unifies the probabilistic autoregressive fluctuations inherent in LLMs with the smooth continuous flow of Flow-Matching.
|
| 418 |
+
|
| 419 |
+
As a result, Flow-Matching has fulfilled its role, and the essence of its continuous flow smoothness persists as an âODE approximationâ within emoPulse and future novel methods.
|
| 420 |
+
|
| 421 |
+
|
| 422 |
+
8. Conclusion
|
| 423 |
+
|
| 424 |
+
EmoSens Generation v3.7 and later has completed the âemotional cycleâ that begins with observing the loss function.
|
| 425 |
+
|
| 426 |
+
Observation (Multi-EMA): Captures the undulations of the terrain.
|
| 427 |
+
Judgment (Trust): Switches between conviction and hesitation at the ±0.5 threshold.
|
| 428 |
+
Action (emoPulse): Determines the optimal stride length through autonomous pulsation.
|
| 429 |
+
|
| 430 |
+
This method is a democratic optimization framework that enables AI to autonomously learn diverse cultures and languages, even within the research environments and limited computational resources of developing countries.
|
| 431 |
+
|
| 432 |
+
|
| 433 |
+
Acknowledgements
|
| 434 |
+
|
| 435 |
+
First and foremost, I extend my deepest gratitude to EmoNavi, EmoSens, and the various optimizers that preceded them, as well as to the researchers involved. Their passion and insights made the conception and realization of this proof possible.
|
| 436 |
+
|
| 437 |
+
This paper provides a mathematical explanation of the already-released EmoSens Generation (v3.7 and later) and its variations. I believe the EmoSens Generation I created (including its derivatives) can contribute to the advancement of AI. Let us use this paper as a foundation to jointly create even more evolved optimizers.
|
| 438 |
+
|
| 439 |
+
I conclude this paper with anticipation and gratitude for future researchers who will bring us the next new insights and ideas. Thank you.
|
| 440 |
+
|
| 441 |
+
|
| 442 |
+
Conclusion
|
| 443 |
+
|
| 444 |
+
This algorithm is not intended to replace existing excellent optimization techniques, but rather to offer a new alternative for deepening the âdialogue with the modelâ during the learning process. We hope it will serve as an aid in the process of users selecting partners suited to their own objectives and sensibilities, and Co-cultivating knowledge.
|
| 445 |
+
|
| 446 |
+
|
| 447 |
+
Supplementary Material (1): Analysis of emoPulse Dynamics in v3.7 and later
|
| 448 |
+
|
| 449 |
+
1. Purpose
|
| 450 |
+
|
| 451 |
+
In v3.7, we analyze the physical significance of the interaction (tug-of-war) between the newly introduced âinstantaneous D/N estimationâ and âtemporal D/N estimationâ for the dynamic control of the learning rate.
|
| 452 |
+
|
| 453 |
+
2. Nature: A dynamic equilibrium between momentary doubt and enduring trust
|
| 454 |
+
|
| 455 |
+
Instantaneous Base (noise_base): noise_base = abs( scalar_t - trust_t ) + ε_s Measures the deviation between the âcurrent emotion scalar (wave)â and the âcurrent trust levelâ. When these do not match (the divergence is large), the system develops âstrong doubts (momentary noise)â about the current state and increases the denominator.
|
| 456 |
+
Time-based foundation (d_base): d_base = abs(noise_est_t - d_est_t) + ε_d Measures the difference between ânoise as history (wave average)â and âconfidence as historyâ. This represents the âconfidence level for updates (temporal distance)â derived from past context.
|
| 457 |
+
|
| 458 |
+
3. Effect: Creation of Dynamic Rhythm
|
| 459 |
+
|
| 460 |
+
Effect A: Immediate Braking During Sudden Changes When sudden loss changes cause the scalar and trust to diverge, the noise_base (denominator) becomes dominant. This allows the learning rate to be instantly reduced as an immediate judgment, even when the temporal history is still stable, thereby preventing divergence before it occurs.
|
| 461 |
+
|
| 462 |
+
Effect B: During the stable phase, when self-accelerated learning progresses smoothly (scalar and trust are stable) and confidence as history (d_base) accumulates, the dNR coefficient maximizes output with a âsquaredâ term. dNR_now_val = ( d_base / noise_base )^2 This naturally increases the âstep sizeâ in stable regions, accelerating convergence.
|
| 463 |
+
|
| 464 |
+
Effect C: Stability Maintenance via History (dNR_hist) Even if the instantaneous dNR_now_val is high, setting a growth limit of dNR_hist * Ό_g suppresses excessive acceleration. On the other hand, in unreliable areas, we continue cautious exploration by accumulating deceleration pressure at dNR_hist * Ό_d.
|
| 465 |
+
|
| 466 |
+
â» The asymmetry of Effect C functions through selection based on d_base <= dNR_hist and trust >= 0.5. This mathematically models the âthumpâ of love and the âthumpâ of caution, accelerating LR within the scalar range of 0 to ±0.5. However, LR acceleration in the negative direction is excluded from the LR history growth. (Values above ±0.5 are unquestionably treated as crisis levels exceeding caution, causing LR deceleration.) LR acceleration in the negative direction of the scalar value represents acceleration trusting the âmodified update direction.â â essentially functioning as âAccelerated Correctionâ. This inherits the emoDrive mechanism from the EmoNavi generation (emo-type 1st generation), which leverages the time difference between EMA and loss (EMA delay). (This research belongs to the EmoSens generation (emo-type 2nd generation)).
|
| 467 |
+
|
| 468 |
+
|--Danger--|---Wary---|---Fine---|--Danger--| Emotion
|
| 469 |
+
Sigma_t [Minus] |---(-)---0.5---(+)---0---(+)---0.5---(-)---| [Plus]
|
| 470 |
+
|--Hist(-)-|-Hist(Non)|--Hist(+)-|--Hist(-)-| Reglet
|
| 471 |
+
|
| 472 |
+
ÎŒ_g and ÎŒ_dïŒ
|
| 473 |
+
v3.7ïŒ[Acceleration:LR Growth Max 1.05x] / [Deceleration:LR Decay 0.98x]
|
| 474 |
+
v3.8ïŒ[Acceleration:LR Growth Max 1.50x] / [Deceleration:LR Decay 0.80x]
|
| 475 |
+
|
| 476 |
+
4. Conclusions on Numerical Stability
|
| 477 |
+
|
| 478 |
+
This design, which pits the difference between the âtime axis (history)â and the âinstant axis (present)â against each other, is not merely a matter of decay. The system autonomously âconstantly recalculates the ratio of âDoubtâ (Noise) to âCertaintyâ (Distance)â, enabling dynamic control akin to âheartbeats responding to terrain complexityââsomething impossible with manual schedulers.
|
| 479 |
+
â» EmoTion, EmoVoid is an original model implemented in v3.8.
|
| 480 |
+
â» dNR_hist has different coefficients in v3.7 and v3.8; v3.8 is more aggressive, designed to produce larger fluctuations than v3.7.
|
| 481 |
+
|
| 482 |
+
|
| 483 |
+
The âsynthesis of flat minima through multiple positioningâ described below is a hypothesis derived from intuition and experimentation.
|
| 484 |
+
I hope this intuition will be refined into a rigorous mathematical proof by the next generation of researchers.
|
| 485 |
+
|
| 486 |
+
|
| 487 |
+
Autonomous Flat-Minima Generation via multiple Positioning of Heterogeneous Optimizers
|
| 488 |
+
|
| 489 |
+
ïŒProposal of a New Learning Method: Prediction of âEvolutionary Flat Minimum Formationâ via Local Synthesis Using of Emo SystemsïŒ
|
| 490 |
+
|
| 491 |
+
|
| 492 |
+
1. Purpose: To resolve the high cost associated with achieving flat minimization.
|
| 493 |
+
|
| 494 |
+
With existing learning methods,
|
| 495 |
+
ã»A single optimizer
|
| 496 |
+
ã»Long hours of repetitive learning
|
| 497 |
+
Progressing toward improved generalizability and achieving flat minimization has become established.
|
| 498 |
+
This requires various resources, including computational resources, and is not an environment that anyone can implement.
|
| 499 |
+
This proposal aims to fundamentally alter this high-cost structure by employing an emo-style optimizer.
|
| 500 |
+
|
| 501 |
+
2. Proposal: Don't âsearchâ for flat minimalismâcreate it yourself.
|
| 502 |
+
|
| 503 |
+
Emo-style models (EmoSens, EmoAiry, EmoCats, EmoTion, EmoVoid) share a common learning structure despite differing update mechanisms. When trained under identical conditions, they yield learning results with differences representing âlocal solutions from different directions.â
|
| 504 |
+
Integrating these divergent learning outcomes constitutes a synthesis of local solutions, and we anticipate that this synthesis may broaden and flatten the local solutions. In other words, it may bring local solutions closer to flat minima or transform them into flat minima themselves.
|
| 505 |
+
|
| 506 |
+
Acquiring these local solutions as full-layer LoRA and integrating them using synthesis methods such as TALL-Mask-Merge,
|
| 507 |
+
|
| 508 |
+
âšâšâš â \___/ Composite image of local solutions
|
| 509 |
+
(multiple local solutions) (Post-synthesis flattening)
|
| 510 |
+
|
| 511 |
+
ã»The âcommonly low areasâ of local solutions in multiple directions are emphasized.
|
| 512 |
+
ã»The sharp edges on multiple (sharp minima) cancel each other out
|
| 513 |
+
ã»As a result, a shape close to a flat valley bottom (flat minimum) is reconstructed.
|
| 514 |
+
|
| 515 |
+
This treats the local solution as multiple positioning (multiple-axis positioning),
|
| 516 |
+
|
| 517 |
+
âInstead of exploring Flat Minimaâ
|
| 518 |
+
This is a new learning method that âcreates flat minimaâ through synthesis.
|
| 519 |
+
|
| 520 |
+
3. Organization: This integration leads to accelerated learning.
|
| 521 |
+
|
| 522 |
+
Concretizing the proposal: Rather than performing long-term training with full-depth LoRA, FFT (Full Fine-Tuning), etc., achieve the goal by conducting slightly shallower learning across multiple types and employing synthesis techniques such as TALL-Mask-Merge. This is expected to make it easier to achieve high-precision learning results even in resource-constrained scenarios.
|
| 523 |
+
|
| 524 |
+
The specific implementation method for this proposal is as follows:
|
| 525 |
+
|
| 526 |
+
ã»Instead of performing long-term training with a single optimizer using all layers of LoRA or FFT,
|
| 527 |
+
ã»Conduct shallow learning separately using multiple emo variants,
|
| 528 |
+
ã»Then integrate the results using TALL-Mask-Merge.
|
| 529 |
+
|
| 530 |
+
As a result,
|
| 531 |
+
|
| 532 |
+
ã»Without relying on lengthy training sessions
|
| 533 |
+
ã»Even in resource-constrained environments
|
| 534 |
+
ã»It is possible to obtain high-precision models approaching flat minimalist architecture
|
| 535 |
+
|
| 536 |
+
4. Conclusion: Integration of Heterogeneous Emotion-Driven Models (Emotional Ensemble)
|
| 537 |
+
|
| 538 |
+
The multiple optimizers proposed in this study (Sens, Airy, Cats, Tion, Void) each inspect the loss landscape based on different mathematical foundations. The âFlat Minima Synthesis via multiple Positioningâ proposed in this study integrates these learning results generated under identical conditions through mask merging (e.g., TALL-Mask-Merge). This approach enables the simultaneous acquisition of âstructural stabilityâ and âexpressive refinementâ that cannot be achieved by a single optimization algorithm. This is expected to become a new optimization paradigm that shifts the learning process in optimization from a temporal pursuit to a spatial, multi-faceted integration.
|
| 539 |
+
|
| 540 |
+
5. Supplementary: Trial Method for Full-Layer LoRA Integration
|
| 541 |
+
|
| 542 |
+
The multiple models were integrated by combining their respective learning results into the original model, and this new multiple-model system was then merged back into the original model using TM-merge.
|
| 543 |
+
|
| 544 |
+
Original Model (org) âª= TM Integration âª= Model S (Sens), Model A (Airy), Model C (Cats), Model T (Tion), Model V (Void)
|
| 545 |
+
|
| 546 |
+
Instead of directly integrating with LoRA alone, we integrated it into the base model and then reduced these multiple models back to the base model using TM-merge.
|
| 547 |
+
FFT predicts that simply merging the multiple models after FFT back to the original model via TM-merge will yield equivalent results.
|
| 548 |
+
|
| 549 |
+
6. Background of Diversity in Terrain Exploration via Heterogeneous Optimizers
|
| 550 |
+
|
| 551 |
+
The multi-positioning proposed by this method actively leverages differences in exploration characteristics arising from variations in algorithm lineage.
|
| 552 |
+
|
| 553 |
+
Statistical inheritance Group:
|
| 554 |
+
EmoSens (Adam-type): Dense gradient estimation via 1st- and 2nd-order moments
|
| 555 |
+
EmoAiry (Adafactor-type): Low-memory, wide-area curvature approximation via matrix decomposition
|
| 556 |
+
EmoCats (Lion-type): Robust search with high noise tolerance via sign extraction
|
| 557 |
+
These achieve liberation from manual schedulers by incorporating time-series SNR control via emoPulse while inheriting the orthodox essence of existing optimization theory.
|
| 558 |
+
|
| 559 |
+
Evolutionary Groups in Geometry:
|
| 560 |
+
EmoVoid / EmoTion (W-Ref Type): Executes updates based on the "freshness" of purely geometric informationâthe orthogonality between weights and gradientsâthereby bypassing traditional statistical accumulation.
|
| 561 |
+
|
| 562 |
+
|
| 563 |
+
|
| 564 |
+
The True Nature of Loss-Saturated Learning Progress
|
| 565 |
+
|
| 566 |
+
ïŒReflections on a Steady Decline with Minimal StagnationïŒ
|
| 567 |
+
|
| 568 |
+
|
| 569 |
+
In this method, it is commonly observed that the loss value rarely stagnates or saturates, generally continuing to decrease. Particularly, the loss value continues to decrease to about half the value of the first step, even raising doubts about when convergence will occur. However, the learning results remain unaffected by failures like overfitting, maintaining highly normal generalization performance. An intuitive understanding of this suggests the possibility that âthe model is learning by treating the repair of the original model as a differential.â
|
| 570 |
+
|
| 571 |
+
This is merely a hypothesis, and like the creation of the flat minimas mentioned earlier, we hope it will be refined into a rigorous mathematical proof by the next generation of researchers.
|
| 572 |
+
|
| 573 |
+
Furthermore, the following guarantees that âas long as the loss value has amplitude, the beat (emoPulse) will not stop.â
|
| 574 |
+
|
| 575 |
+
noise_base = abs(sigma_t - trust_t) + ε_s
|
| 576 |
+
d_base = abs(N_t - d_t) + ε_t
|
| 577 |
+
|
| 578 |
+
These ε_s and ε_t are precisely what generate continuous downward behavior free from stagnation, creating the driving force to explore flat minima. This can also be interpreted as convergence occurring when the difference in loss values disappears. Through this design, learning tests on the Simplenet (FashionMNIST) demonstrate reproducible results, confirming that loss values below 0.30 can be achieved within 10,000 steps.
|
| 579 |
+
|
| 580 |
+
In experimental verification using SDXL, training with e-pred + ZtSNRâwhich was achievable with the previous generation EmoNavi and its variantsâcan also be performed with this EmoSens and its variants. This resolves issues regarding noise tolerance in Flow-Matching (FM) and sampler compatibility, while simultaneously addressing challenges like color gamut limitations, which were considered weaknesses of e-pred. which are considered weaknesses of e-pred. Training for 300 epochs using only about 10 training images completed without stagnation, and we successfully created a full-layer LoRA model showing no overfitting tendencies.
|
| 581 |
+
|
| 582 |
+
Further extreme testing with a single image over 300 steps also completed without stagnation, confirming the learning results remained intact.
|
| 583 |
+
Even under extreme learning settings, no breakdown occursâwe believe this is because updates are performed without accumulating noise.
|
| 584 |
+
|
| 585 |
+
Fundamentally, noise is thought to arise from errors in weighting minute data points. We consider it crucial to prevent noise generation by appropriately updating minute data to protect and maintain valuable information.
|
| 586 |
+
|
| 587 |
+
Furthermore, we performed full-layer training (both encoding and decoding) on the SDXL VAE. Previous VAE retraining efforts resulted in compromised consistency with the model, ultimately leading to degraded generation outcomes. However, we confirmed that the optimizer proposed in this study maintains this consistency without degradation. We believe this will enhance the reusability of the VAE and contribute to extending the model's operational lifespan.
|
| 588 |
+
|
| 589 |
+
An investigation into extreme noise model training: We performed SDXL vanilla model initialization (weight initialization with random values) and conducted full-layer LoRA training using this as the base model.
|
| 590 |
+
|
| 591 |
+
Under normal circumstances, training would diverge or produce NaN values within a few steps, leading to failure. However, the EmoSens generations each progressed through training and completed 1500 steps.
|
| 592 |
+
|
| 593 |
+
This LoRA should have failed, yet it defied expectations and applied successfully to the pre-initialized SDXL vanilla model without breakdown.
|
| 594 |
+
|
| 595 |
+
Surprisingly, since this LoRA was trained as a state prior to the vanilla model, it improved the continuity of horizons and ground linesâareas where the vanilla model strugglesâand corrected positional shifts when crossing subjects (it is also applicable to derivative SDXL models with similar effects).
|
| 596 |
+
|
| 597 |
+
This test confirms that the EmoSens generation possesses excellent robustness in terms of stability and safety.
|
| 598 |
+
|
| 599 |
+
â» This LoRA exhibited similar effects across multiple seeds, potentially demonstrating âregularizing behaviorâ that mitigates specific artifacts in SDXL. However, it remains inconclusive whether this effect stems from intentional learning or coincidental alignment. Please understand this solely as confirmation that learning progression remains stable under extreme conditions.
|
| 600 |
+
â» A steady decline in loss can be observed when learning rate decay based on the early stopping criterion (convergence prediction) introduced in v3.8.6 or later is not applied (the phenomenon described above can be observed when learning rate decay based on the early stopping criterion is disabled and control is left to emoPulse).
|
| 601 |
+
|
| 602 |
+
|
| 603 |
+
Predictions about Grokking
|
| 604 |
+
|
| 605 |
+
This study focused on the behavior of continuous loss value reduction with minimal stagnation and conducted various tests to verify its underlying factors.
|
| 606 |
+
Specifically, as an extreme learning condition, we evaluated âhow far safe and stable learning progress is possible using only a single image.â
|
| 607 |
+
As a result, we observed no typical failures such as overfitting, collapse into a copying state, or interference with unrelated prompts, confirming extremely stable learning results.
|
| 608 |
+
|
| 609 |
+
Based on these results, we predict that Grokking is a âstagnation phenomenonâ arising from the combined effects of the following two factors.
|
| 610 |
+
|
| 611 |
+
- The accumulation of noise learned during the training process increases inaccuracies requiring correction in the latter stages of training, causing the model's visibility to deteriorate rapidly (whiteout/blackout phenomenon)
|
| 612 |
+
- In the latter stages of trainingâthe phase most in need of correctionâthe scheduler and gradient statistics suppress learning rate (LR), causing LR to drop drastically.
|
| 613 |
+
|
| 614 |
+
These two factors occurring simultaneously cause the model to lose its fundamental direction and fall into a prolonged stagnation period. In other words, Grokking is considered an avoidable phenomenon.
|
| 615 |
+
|
| 616 |
+
Emo-style (EmoSens generation) The reason why Grokking can be avoided is clear.
|
| 617 |
+
|
| 618 |
+
This method enables the following updates, thereby maintaining a clear field of view and preserving the driving force for continued learning.
|
| 619 |
+
- Maintaining update accuracy and preventing noise accumulation
|
| 620 |
+
- Autonomously securing the necessary learning rate even in the latter stages of training
|
| 621 |
+
|
| 622 |
+
Even if visibility deteriorates, the entire emotional mechanism functions like a high-precision GPS, ensuring emoPulse's accurate heartbeat keeps moving forward. This allows one to naturally approach flat minima or global optima without experiencing Grokking.
|
| 623 |
+
Grokking is often examined as an âunexplained delay generalization,â but as seen in the aforementioned SDXL training results, the essence of the Grokking phenomenon can be considered a stagnation caused by structural flaws within the algorithm itself.
|
| 624 |
+
dNR detects signs of incorrect weighting and unorganized microdata, identifies inconsistencies with abstract structures, and corrects them. We believe that if microdata is handled correctly, generalized solutions will form more quickly.
|
| 625 |
+
|
| 626 |
+
|
| 627 |
+
Future Challenges: Introduction of Adaptive Accuracy Assessment Using the 8th-Order Moment Approximation
|
| 628 |
+
|
| 629 |
+
Looking ahead, we are considering introducing a âhigher-order accuracy assessment mechanismâ utilizing dNR cubed (equivalent to the 8th-order moment).
|
| 630 |
+
This approach does not directly output the 8th-order information as emoPulse output (the emoPulse mechanism remains unchanged). Instead, it attempts to utilize this information as a meta-indicator to evaluate the âpurityâ of the current learning process.
|
| 631 |
+
We anticipate this will enable earlier detection of overfitting signs in minimal datasets, pushing autonomous control accuracy to its limits. Alternatively, accuracy detection might be possible by analyzing differences between past and present dNR histories.
|
| 632 |
+
However, this is an optional feature to be implemented as needed. Based on current validation test results, we judge there is no urgency to proceed.
|
| 633 |
+
â» The early shutdown detection notification (convergence indication notification) implemented prior to v3.8 is presumed to correspond to an approximation of the 8th or 9th moment.
|
| 634 |
+
â» The mechanism, which is presumed to be an approximation equivalent to the 8th-order moment, is shown below
|
| 635 |
+
|
| 636 |
+
|
| 637 |
+
Supplementary Material (2): A Study on Spatio-Temporal Integration and Self-Organization of Higher-Order Moments in Optimization Algorithms
|
| 638 |
+
|
| 639 |
+
1. Temporal axis: 2nd-order structure of time curvature in the 8th-order (dNR_hist)
|
| 640 |
+
In the analysis of temporal recursive structures, it is defined by the application of a quadratic operation to dNR_hist, combined with an asymmetric growth limit of 1.50 and a decay limit of 0.80.
|
| 641 |
+
This squaring operation generates a signal-to-noise ratio (SNR) equivalent to the 7th order, and performs comparisons (min/max) and coefficient multiplication based on that history.
|
| 642 |
+
This recursive process corresponds to the calculation of âcurvature of curvatureâ (the second derivative) in differential geometry.
|
| 643 |
+
This method goes beyond simply adjusting the learning rate dynamically; it extracts the signal-to-noise ratio (SNR) from the âfluctuationsâ in the loss function and tracks the ârate of change in confidenceâ with 8th-order resolution.
|
| 644 |
+
This incorporates the âtemporal curvatureâ of the 7th-order moment into a nonlinear 2nd-order structure, thereby imparting an intuitive rhythm to the optimization process.
|
| 645 |
+
|
| 646 |
+
2. Spatial axis: 2nd-order structure of spatial curvature in 8th-order (W-Ref Geometry) space
|
| 647 |
+
We define this using âW-Ref Geometry,â which assumes a transition along a geodesic on a manifold in Riemannian geometry and performs a uniform scaling of the total L1 norm.
|
| 648 |
+
Rather than manipulating individual parameters independently, this mechanism treats the âvolume of the manifoldâ formed by hundreds of millions of weights as a single, massive âfieldâ and performs a unified correction.
|
| 649 |
+
Instead of directly calculating the individual 8th-order correlations, we ensure higher-order consistency by utilizing the law of energy conservation for the entire system.
|
| 650 |
+
This is an 8th-order volumetric control method that governs the energy state of the entire space.
|
| 651 |
+
|
| 652 |
+
3. Emotional Axis: Metastatistics in the 8th-Order (Nonlinear Compression of Sigma/Trust)
|
| 653 |
+
We define the 2nd-order effect of scalar / trust â dNR2 resulting from the superposition of scalar and exponential moving average (EMA) terms using a âmeta-statisticâ that plays an 8th-order role.
|
| 654 |
+
A tanh function is applied to the differences between the three-layer EMAs (Short/Medium/Long) to ensure boundedness. Here, the discrepancy between the âidealâ (long-term indicator) and ârealityâ (short-term indicator) is quantified as âstressâ (scalar).
|
| 655 |
+
This functions as an âearly warning detectionâ mechanism at the 8th level, enabling the model to autonomously detect the system's limits before it reaches the critical point of divergence.
|
| 656 |
+
|
| 657 |
+
4. Spacetime Unification: The 2nd-Order Structure of Spacetime Phases in the 8th-Order (SDE â DDE â ODE Reduction)
|
| 658 |
+
The emoPulse mechanism used in this optimization incorporates the reduced structures of stochastic differential equations (SDEs), delayed differential equations (DDEs), and ordinary differential equations (ODEs).
|
| 659 |
+
Phase synchronization across these three levels faithfully reproduces the temporal evolution of higher-order moments.
|
| 660 |
+
Since this structure satisfies the conditions for a contraction mapping, convergence is mathematically guaranteed without depending on external scheduling.
|
| 661 |
+
|
| 662 |
+
5. Reincarnation Axis: Convergence Determination and Self-Recursion via 8thâ9th Orders (Composite Higher-Order Moments)
|
| 663 |
+
Convergence is determined based on the â2nd-order phase structureâ that arises when the four axesâtime, space, emotion, and physicsâare synchronized.
|
| 664 |
+
Perform phase synchronization analysis of the SDE (noise component) and ODE (deterministic component), and execute self-rewriting using emoScope.
|
| 665 |
+
The moment âstochastic fluctuationsâ and âdeterministic convergenceâ align, the system autonomously updates its hyperparameters and re-enters a finer dimension.
|
| 666 |
+
This self-recursive evolutionary process can be described as a form of biological self-organization not found in conventional optimizers.
|
| 667 |
+
|
| 668 |
+
When the scalar is defined as a 6th-order meta-statistic (d_base â noise_base) and the SNR difference as a 7th-order quantity, the decision rule is expressed as follows:
|
| 669 |
+
|
| 670 |
+
Stop=1{â£sigmaâ£<ε1â§â£d_baseânoise_baseâ£<ε2}
|
| 671 |
+
|
| 672 |
+
This detects the region that simultaneously satisfies the stability of the 6th-order moment and the consistency of the 7th-order moment, thereby observing the âintersection regionâ of higher-order moments.
|
| 673 |
+
|
| 674 |
+
The âemotional cycleâ described in Section 8 of this paper becomes a âchainâ equivalent to an 8th-order approximation here; when these elements reach âresonance,â time (SDE â DDE â ODE), space (2nd-order correction of volume), and direction (purification of signs) oscillate in phase, generating a âResonant Projection Field.â
|
| 675 |
+
|
| 676 |
+
At this point, the system undergoes a resonant contraction and transitions to the following new mapping:
|
| 677 |
+
|
| 678 |
+
wt+1=Contract(wt,Ί(t))
|
| 679 |
+
|
| 680 |
+
|
| 681 |
+
Perspectives on Mathematical Analysis
|
| 682 |
+
|
| 683 |
+
Mathematically analyzing this research suggests it may be concluded that while employing an SDE approach, it exhibits ODE-like characteristics. This update rule via emoPulse incorporates both stochastic fluctuations and temporal smoothness, potentially possessing a unique structure positioned at the boundary between SDE and ODE. (Since the loss value is the result of learning, the method is expected to behave in an ODE-like manner as it derives from the final outcome).
|
| 684 |
+
|
| 685 |
+
How the history formation via Multi-EMA and the transitions of internal variables might be interpreted in continuous time remains a vital challenge for future mathematical research. This paper indicates only the intuitive direction; the detailed formalization is left to future researchers for further development.
|
| 686 |
+
|
| 687 |
+
â» The process of the SDE-DDE-ODE contraction cascade described in this paper is a hypothesis rooted in physical intuition and experimental facts. The task of formalizing this transition with rigorous equations is an open invitation to the next generation of researchers. I believe that the true "beginning of dialogue with the model" lies in filling these gapsâdiscovering what new mathematical order lies hidden within the rhythmic interstices of emoPulse.
|
| 688 |
+
|
| 689 |
+
|
| 690 |
+
References
|
| 691 |
+
|
| 692 |
+
Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.
|
| 693 |
+
|
| 694 |
+
Reddi, S. J., Kale, S., & Kumar, S. (2019). On the Convergence of Adam and Beyond. ICLR.
|
| 695 |
+
|
| 696 |
+
Defazio, A., & Mishchenko, K. (2023). Learning-Rate-Free Learning by D-Adaptation. ICML.)
|
| 697 |
+
|
| 698 |
+
Orabona, F., & Tommasi, T. (2017). Training Deep Networks without Learning Rates Through Coin Betting. NeurIPS.
|
| 699 |
+
|
| 700 |
+
Luo, L., Xiong, Y., & Liu, Y. (2019). Adaptive Gradient Methods with Dynamic Bound of Learning Rate. ICLR.
|
| 701 |
+
|
| 702 |
+
Shazeer, N., & Stern, M. (2018). Adafactor: Adaptive Learning Rates with Sublinear Memory Cost. ICML.
|
| 703 |
+
|
| 704 |
+
Bernstein, J., Wang, Y. X., Azizzadenesheli, K., & Anandkumar, A. (2018). signSGD: Compressed Optimisation for Non-Convex Problems. ICML.
|
| 705 |
+
|
| 706 |
+
Chen, S. B., et al. (2023). Symbolic Discovery of Optimization Algorithms. arXiv.
|
| 707 |
+
|
| 708 |
+
Zeyuan Allen-Zhu. (2017). Natasha: Faster Non-Convex Optimization Than SGD. arXiv.
|
emo-v386plus-paper(JPN).txt
ADDED
|
@@ -0,0 +1,588 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
è«æïŒèªåŸçæé©åã¢ã«ãŽãªãºã emoPulse ã«ãããæç³»å SNR æšå®ãš Regret Bound ã®æ¹å㚠éã¿ãšåŸé
ã®å¹ŸäœåŠççŽäº€æ§ïœ£ã«ãã2次ã¢ãŒã¡ã³ãã»ããªãŒæŽæ°ã®æ¢ç©¶ããã㊠Flow-Matching ã®ãã®å
ãž
|
| 2 |
+
|
| 3 |
+
ã æå€±å°åœ¢ã®åçå
å¯ã«ããææ
é§åååŠç¿çå¶åŸ¡ã®ç¢ºç« ãš æå€±å°åœ¢ãšã®å¯Ÿè©±ã«ããæ¬¡äžä»£æé©åã®ææ¡ ã
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
èŠæš (Abstract)
|
| 7 |
+
ãã£ãŒãã©ãŒãã³ã°ã®æé©åã«ãããŠåŠç¿çã®èª¿æŽãšæ±åæ§èœã®ç¢ºä¿ã¯äžå¿çãªèª²é¡ã§ããã æ¢åææ³ã¯ç²Ÿç·»ãªåŸé
æšå®ã«äŸåããæ¥µäœç²ŸåºŠç°å¢äžã§ã®ãã€ãºã«å¯ŸããŠè匱ã§ãã£ãã æ¬çš¿ã§ã¯ãæå€±é¢æ° (Loss) ã®æç³»åçãªå€è§è§£æã䞻軞ã«çœ®ããèªåŸçã¢ã«ãŽãªãºã emoPulse (v3.7以é) ãææ¡ããã æ¬ææ³ã¯ã3段éã®ææ°ç§»åå¹³å (Multi-EMA) ããæå€±å°åœ¢ã®ïœ¢ããããæããææ
ã¹ã«ã©ãŒããã³ä¿¡é ŒåºŠææš (Trust) ãä»ããS/Næ¯ã«åºã¥ãæé©ãªåŠç¿çãèªåŸçã«çæããã
|
| 8 |
+
次ã«ãéã¿ãšåŸé
ã®å¹ŸäœåŠçé¢ä¿ã«çç®ããæŽæ°å W-Ref Geometry ãææ¡ããã ããã¯ãéã¿ãšåŸé
ã®çŽäº€æ§ (Orthogonality) ã«åºã¥ããŠæ
£æ§ãåçã«å¶åŸ¡ããããšã§ãïŒæ¬¡ã¢ãŒã¡ã³ããä¿æãããå°åœ¢ã®å€åã«å³å¿ããïœ¢ïŒæ¬¡ã¢ãŒã¡ã³ãã»ããªãŒïœ£ãªæŽæ°ãå®çŸããã ããã«ããVRAMåæžãäž¡ç«ããèšç®è³æºã®éãããç ç©¶ç°å¢ã倿åå
±çã®ããã®å€èšèªåŠç¿ã«æ°äž»çãªåºç€ãæäŸããã
|
| 9 |
+
ç¶ããŠãemoPulse ã®è§£æãšããã® emoPulse ãçŸåšã®èª²é¡ã«ã©ã圱é¿ãããã«ãèšåããã ãã㯠LLM ã«é¢ãã Flow-Matching(FMæ³) é©å¿ãžã®å¯äžãšãªãåŸãã FMæ³ã®æ±ºå®è«çãªåŠç¿éçšã LLM ã«é©çšããéã«çãã課é¡ã«å¯ŸããŠããã®äžéšãè£å®ããææ¡ãè¡ããäž¡è
ãã€ãªãæ°ããæé©åã®æ¹åæ§ã瀺ãã FMæ³ã®å
ã§ã¯ RNN/SMMé²åç³»ãLNN(LiquidAI/MIT)ãMamba(CMU à Princeton)ãTitans(Google)çã®ã¢ãŒããã¯ãã£ãžã®èªç¶çæ¥ç¶ãããæé©åææ³ã®äžã€ãšãªãåŸããšäºæ³ããã
|
| 10 |
+
ããã«ãæ¬ç³»ã«å±ããïŒçš®ã®ç°ãªãæŽæ°ç¹æ§ãæã€æé©ååš ( Sens / Airy / Cats / Tion / Void ) ã®åŠç¿çµæãåæããããšã§ãå±æè§£ãå€å
枬äœïœ£çã«çµ±åãã人工çã«ãã©ããããããåµåºããææ³ãæç€ºããã ããã«ãããã€ããŒãã©ã¡ãŒã¿ã®èšå®ã«äŸåããªãé å¥ãªåæãå®çŸããèšç®è³æºã®éãããéäžåœã®ç ç©¶ç°å¢ãã倿§ãªæåéºç£ã®ç¶æ¿ãç®æãå€èšèªåŠç¿ã«ãããŠæ°äž»çãªåºç€ãæäŸããã
|
| 11 |
+
æåŸã«ã°ãããã³ã°ãžã®èå¯ãšäºæ³ãä»é²ããã
|
| 12 |
+
â» v3.7ç㯠EmoTion, EmoVoid ãé€ã (EmoTion, EmoVoid 㯠v3.8çã§æ°èŠéçº) åŸè¿°ãã emoPulse æ©æ§ã® dNR_hist ã§ v3.7 ãš v3.8 ã«éããããã ãã§ä»ã¯ãã¹ãŠåäžã§ããã
|
| 13 |
+
â» v3.8.6 以éããã®ææ³ãå
±é³Žåçž®æ³ïœ£(å
±é³Žæåœ±å Ž)ãšåŒã¶(確ççåŸé
éäžæ³ã§ã¯ãªã) ããã«ã€ããŠã¯æ¬çš¿ã®æåŸã§ïŒæ¬¡ã¢ãŒã¡ã³ãã®èå¯ã§è©³è¿°ããã
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
1. ç·èš
|
| 17 |
+
|
| 18 |
+
æ¬çš¿ã§ã¯ãæé©ååš EmoSens / EmoAiry / EmoCats / EmoTion / EmoVoid ã«ãããçµ±äžçè«ãæç€ºããã æ¬ææ³ã¯ãLosså€ã®ææ°ç§»åå¹³å (EMA) ãå€å±€åããæå€±é¢æ°ã®æç³»åçµ±èšéãã ïœ¢ä¿¡é ŒåºŠïœ£(Trust) ãæœåºããããšã§ãåŠç¿çãèªåŸçã«çæãã emoPulse æ©æ§ãæ žãšããã ããã¯æ°åŠçã«ã¯ãD-adaptation çè«ãšæç³»åä¿¡å·åŠç (SNRæšå®) ã®é«åºŠãªèåã§ããããã€ããŒãã©ã¡ãŒã¿ã®èšå®ã«äŸåããªãé å¥ãªåæãå®çŸããã
|
| 19 |
+
|
| 20 |
+
æ¬ç ç©¶ã®åºçºç¹ã¯ãæ¢åã®é©å¿çåŸé
ææ³ãæã€ïœ¢ç²Ÿç·»ãªåŸé
æšå®ãžã®é床ãªäŸåã«å¯Ÿããåèã«ããã æ¥µäœç²ŸåºŠã»è¶
éåå (1-bit/2-bitç) ç°å¢ã«ãããŠãåŸé
(Gradient) ã¯æ¥µããŠé«ããã€ãºãå«ã¿ãä¿¡é Œæ§ãèããäœäžããã äžæ¹ã§ãæå€±å€ (Loss) ã¯ãéååã®åœ±é¿äžã«ãã£ãŠãäŸç¶ãšããŠã¢ãã«ã®ïœ¢æ£è§£ãšã®è·é¢ïœ£ãç€ºãæ£ç¢ºãªã¹ã«ã©ãŒå€ãšããŠæ©èœãç¶ããã
|
| 21 |
+
|
| 22 |
+
æ¬ææ³ã¯ãåŸé
(Gradient) ãæ¹åã®åèå€ (æå¿) ã«çããåŠç¿ã®äž»å°æš©ãæ£ç¢ºãªèŠ³æž¬å€ã§ãã Loss ã®å€è§çè§£æã«å§ããã ãã®ã¢ãããŒãã«ããã髿¬¡ã¢ãŒã¡ã³ãèšç®ã®ã¹ã«ã©ãŒå¶åŸ¡ãžã®çœ®æãããã³ç¬Šå·åæŽæ°ã«ããäœç²ŸåºŠã»éååç°å¢ãžã®æé©åãéæããã æå€§ã®ç¹åŸŽã¯ãç°ãªãç¹æ§ãæã€è€æ°ã® emoç³»æé©ååšã«ããå±æè§£ãå€å
枬äœïœ£ãšããŠçµ±åããããšã§ãåŸæ¥ã¯é·æéã®å埩åŠç¿ãå¿
èŠãšãããã©ããããããžã®å°éããçæéã®åŠç¿ãšåæã«ãã£ãŠä»£æ¿å¯èœã«ããç¹ã«ããã
|
| 23 |
+
|
| 24 |
+
ãã®ã¢ãããŒãã«ããã以äžã®3ã€ãå®çŸããïŒ
|
| 25 |
+
|
| 26 |
+
èšç®å¹çã®åçåäžïŒé«æ¬¡ã¢ãŒã¡ã³ãã®è€éãªèšç®ã Loss ã®æéçç©ç®ã«ããã¹ã«ã©ãŒå¶åŸ¡ã«çœ®æãæéçç©ç®ã«ããè¿äŒŒã§æŒç®è² è·ã軜æžããã
|
| 27 |
+
|
| 28 |
+
äœç²ŸåºŠïœ¥éååãžã®æé©åïŒEmoAiry ã«ãããè¡ååè§£ãEmoCats ã«ãããïŒæ¬¡ã¢ãŒã¡ã³ãã®å®å
šæé€ããšããªãªãžãã«(ç¬èªå) EmoTion, EmoVoid ã«ãã幟äœåŠççŽäº€æŽæ°ïœ£ãšïŒæ¬¡ã¢ãŒã¡ã³ãå®å
šæé€ãå«ããæŽæ°ã®ç¬Šå·åã«ããäœãªãœãŒã¹ç°å¢ã§ã®å€§èŠæš¡åŠç¿ãå¯èœã«ããã
|
| 29 |
+
|
| 30 |
+
èªåŸçåæïŒæå€±å°åœ¢ã® S/N æ¯ãå
å¯ããããšã§ãæåã®ã¹ã±ãžã¥ãŒã©ãäžèŠãšãããŠãŒã¶ãŒã®è©Šè¡ã³ã¹ããæå°åããã
|
| 31 |
+
|
| 32 |
+
â» é«æ¬¡ã¢ãŒã¡ã³ãè¿äŒŒïŒæé軞ã«ããã髿¬¡çµ±èšé (Time-series Higher-order Statistics) ãžã®éçŽ
|
| 33 |
+
|
| 34 |
+
ããã¯æ°åŠçã«ã¯ãD-adaptation çè«ãšæç³»åä¿¡å·åŠçã®é«åºŠãªèåã§ãããéäžåœã®ç ç©¶ç°å¢ã倿§ãªæåãéºãããã®ïœ¢æ°äž»çãªAIåŠç¿ïœ£ãå®çŸããåºç€ãšãªãã
|
| 35 |
+
|
| 36 |
+
â» EmoTionã EmoVoid ã¯ã髿¬¡ã¢ãŒã¡ã³ãã®èšç®ãã¹ã«ã©ãŒå¶åŸ¡ãžçœ®æããã ãã§ãªããéã¿èªèº«ãæã€å¹ŸäœåŠçãªæ
å ±ãæŽæ°ã®æéãšããããšã§ã2次ã¢ãŒã¡ã³ããå¿
èŠãšããªã軜éãªæ§é ãå®çŸããŠãã (第6ç« ã«ãŠè©³è¿°)
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
2. çè«çãã¬ãŒã ã¯ãŒã¯ïŒææ
埪ç°ç³» (Emotional Circulation)
|
| 40 |
+
|
| 41 |
+
æ¬ã·ã¹ãã ã¯ãæå€±é¢æ° L ãåç¹ (Origin) ãšãããã£ãŒãããã¯ã»ã«ãŒãã圢æããã
|
| 42 |
+
|
| 43 |
+
2.1 Multi-EMA ã«ãã髿¬¡ã¢ãŒã¡ã³ãã®è¿äŒŒ
|
| 44 |
+
|
| 45 |
+
3段éã® EMA (short, medium, long) ã®å·®åãçšããããšã§ãæå€±å°åœ¢ã®ïœ¢æ²çã®å€åãå€åã®äžç¢ºå®æ§ïœ£ãå€åã®å€åãæããã
|
| 46 |
+
|
| 47 |
+
EMA_t = (1 - α) * EMA_{t-1} + α * L_t
|
| 48 |
+
|
| 49 |
+
ãã®å·®åããçæãããïœ¢é«æ¬¡æéå·®å(High-order Temporal Difference)ïŒããã"ææ
ã¹ã«ã©ãŒ"ãšå®çŸ©ããã ãã®ææ
ã¹ã«ã©ãŒ sigma_t ã¯ã髿¬¡ã¢ãŒã¡ã³ã (æªåºŠïœ¥å°åºŠïœ¥å€å) ã®æ
å ±ã [â1,1] ã«å§çž®ããéç·åœ¢çµ±èšéã§ããã ãããæé宿°ã®ç°ãªãè€æ°ã® EMA ããéå»ã®èšå€§ãªã¹ãããã履æŽïœ£ãšããŠéå±€çã«èç©ããã ãã®çžå¯Ÿçãªæéé
å»¶å·®å (Time-delay Differential) ããšãããšã§ãéçãªå°åœ¢ã®è§£æã§ã¯äžå¯èœãªïœ¢åŠç¿ã®é²è¡ã«äŒŽãå°åœ¢ã®åçãªé«æ¬¡å€åçã芳枬ããŠããã ãããæŽæ°åŒã«ååž°çã«å«ããããšã§ãé·é·æçãªå°åœ¢ã®ïœ¢æ»ãããããã©ã¡ãŒã¿æŽæ°ã«åæ ãããŠããã
|
| 50 |
+
|
| 51 |
+
â» é«æ¬¡ã¢ãŒã¡ã³ãã®æç³»åç圢æã«é¢ããæ³šæïŒ
|
| 52 |
+
æ¬ææ³ã«ããã髿¬¡ã¢ãŒã¡ã³ãè¿äŒŒã¯ãåäžã¹ãããã®åŸé
æ
å ±ããç®åºããããã®ã§ã¯ãªããæéçç©ç®ã«ãã圢æãããã ããã¯éçãªå°åœ¢ã®æ²çã§ã¯ãªãåŠç¿ã®é²è¡ã«äŒŽãå°åœ¢ã®åçãªå€åçã芳枬ããŠããããšãæå³ããã
|
| 53 |
+
â» é«æ¬¡ã¢ãŒã¡ã³ãè¿äŒŒã®éå±€æ§é ïŒ
|
| 54 |
+
æ¬ææ³ã¯ãLoss ã®æéçç©ç®ãéããŠãå®å¹çã«ïŒæ¬¡ (æªåºŠ) ãã ïŒæ¬¡ (確信床ã®å¢å¹
) ãŸã§ã®é«æ¬¡ã¢ãŒã¡ã³ããè¿äŒŒçã«èšç®ããŠããã ããã¯éçãªå°åœ¢è§£æã§ã¯ãªããåŠç¿ãšããåçããã»ã¹ã«ããã系ã®ç¢ºä¿¡åºŠïœ£ãç©çéãšããŠæœåºãã詊ã¿ã§ããã
|
| 55 |
+
|
| 56 |
+
æ¬ææ³ã«ããã Multi-EMA æ§é ã¯ãçµ±èšåŠã«ããã髿¬¡ã¢ãŒã¡ã³ãã®åçãªæéçè¿äŒŒãšããŠæ©èœããã
|
| 57 |
+
|
| 58 |
+
ïŒæ¬¡ãïŒæ¬¡è¿äŒŒïŒShort / Medium / Long ã®å EMA ã®å·®åã¯ãæå€±ååžã® æªåºŠ(Skewness)ãå°åºŠ(Kurtosis)ãå€å(Fluctuations) ãšãã£ã髿¬¡æ
å ±ã®æéçæšç§»ãæœåºããã
|
| 59 |
+
ïŒæ¬¡è¿äŒŒïŒããããçµ±åããææ
ã¹ã«ã©ãŒ sigma_t ããã³ãä¿¡é ŒåºŠ trust_t ã¯ãåãªãåŸé
ã®åæ£ãè¶
ããåŠç¿ãã§ãŒãºã®å®å®æ§ïœ£ã瀺ãïŒæ¬¡çžåœã®ã¡ã¿çµ±èšéãšãªãã
|
| 60 |
+
ïŒæ¬¡è¿äŒŒ (dNR)ïŒdNR ã®å°åºã«ãããŠããããïŒæ¬¡æ
å ±ã®æ¯çãïŒä¹ (d_base/noise_base)^2 ããããšã§ã埮现ãªç¢ºä¿¡åºŠã®å·®ãææ°é¢æ°çã«å¢å¹
ããïŒæ¬¡ã¢ãŒã¡ã³ãã«çžåœããæ¥µããŠéæãªå¶åŸ¡ä¿¡å·ãšãªãã
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
2.2 ä¿¡é ŒåºŠææš trust_t ã®å®çŸ©
|
| 64 |
+
|
| 65 |
+
æŽæ°ã®ïœ¢è³ªïœ£ã決å®ããã³ã¢ææš trust_t ã以äžã®ããã«å®çŸ©ããã
|
| 66 |
+
|
| 67 |
+
trust_t = sgn(sigma_t) * (1.0 - abs(sigma_t))
|
| 68 |
+
|
| 69 |
+
ãã® trust ã¯ã±1.0 (å®å
šãªç¢ºä¿¡) ã«ã 0 (å®å
šãªçµ¶æ) ã«ãå°éããªãæçæ§ãæã¡ãã·ã¹ãã ã«åžžã«é©åºŠãªïœ¢æ¢çŽ¢ã®äœå°ïœ£ãšïœ¢æ
éããç¶æãããã
|
| 70 |
+
|
| 71 |
+
ããã«ãã æå€±é¢æ° L ãåç¹ ãšãã以äžã® ãã£ãŒãããã¯ã»ã«ãŒã(ææ
埪ç°ç³») ã圢æãã
|
| 72 |
+
|
| 73 |
+
Loss â Multi-EMA â Scalar/Trust â emoPulse â Loss
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
3. emoPulseïŒèªåŸçæåã«ããåŠç¿ççæ
|
| 77 |
+
|
| 78 |
+
v3.7以éã«ãããŠãåŸæ¥ã® emoDrive (å éæ©æ§) 㯠emoPulse ãžãšçµ±åãããã ããã¯æç³»åã® S/N æ¯ (Signal-to-Noise Ratio) ã«åºã¥ãåçè·é¢æšå® (D-adaptation) ã®è¿äŒŒã«ããé²å圢ã§ããã
|
| 79 |
+
|
| 80 |
+
3.1 Noise ããã³ Distance ã®åçæšå®
|
| 81 |
+
|
| 82 |
+
ã·ã¹ãã ã®ïœ¢è¿·ããšïœ¢é²æïœ£ã以äžã® 2ã€ã®å
éšå€æ° N_t, d_t, ã§è¿œè·¡ããã ããã§ N_t ã¯ïœ¢æºã(äžå®å®æ§)ãd_t ã¯ïœ¢é²æïœ£(è·é¢) ã衚ãã
|
| 83 |
+
|
| 84 |
+
Noise_est (N_t) N_t = (1 - α) * N_{t-1} + α * abs(sigma_t)
|
| 85 |
+
Distance Estimate (d_t) d_t = (1 - α) * d_{t-1} + α * abs(trust_t)
|
| 86 |
+
|
| 87 |
+
3.2 emoPulse ã®å®çŸ©ãšèªåŸå¶åŸ¡ / ç¬éç SNR ãšå±¥æŽç®¡ç (dNR_hist)
|
| 88 |
+
|
| 89 |
+
emoPulse ã®çæã¯ãç¬éç㪠SNR ãšæéç㪠SNR ã®ïœ¢ç¶±åŒãã«ãã£ãŠæ±ºå®ãããã ãŸããç¬éçã»æéçããããã®åºç€ãç®åºããã
|
| 90 |
+
|
| 91 |
+
noise_base = abs(sigma_t - trust_t) + ε_s
|
| 92 |
+
d_base = abs(N_t - d_t) + ε_t
|
| 93 |
+
|
| 94 |
+
ããããçšããçŸåšã® SNR 匷床ã以äžã®ããã«å®çŸ©ããã
|
| 95 |
+
|
| 96 |
+
dNR_now_val = ( d_base / noise_base )^2
|
| 97 |
+
|
| 98 |
+
dNR_hist ã®æŽæ°èŠåïŒ
|
| 99 |
+
|
| 100 |
+
å 鿡件ïŒ
|
| 101 |
+
if dNR_now_val >= dNR_hist and trust_t >= threshold_high:
|
| 102 |
+
dNR_hist = min( dNR_now_val, dNR_hist * factor_grow )
|
| 103 |
+
|
| 104 |
+
æžéæ¡ä»¶:
|
| 105 |
+
if threshold_low <= trust_t <= threshold_high:
|
| 106 |
+
dNR_hist = dNR_now_val * factor_decay
|
| 107 |
+
|
| 108 |
+
æçµçãªåŠç¿ç emoPulse ã¯ä»¥äžã§æ±ºå®ãããã
|
| 109 |
+
|
| 110 |
+
emoPulse_t = clamp( dNR_hist * (emoScope * η_base), η_min, η_max )
|
| 111 |
+
|
| 112 |
+
ãã®èšèšã«ããã以äžã®èªåŸçæåãä¿èšŒãããïŒ
|
| 113 |
+
|
| 114 |
+
確信é å (â£trustâ£>0.5)ïŒSNR ãåäžããåŠç¿çãæå€§å éã ãã©ããããããé«éã«ç®æãã
|
| 115 |
+
é¡å·¡é å (â£trustâ£<0.5)ïŒäžç¢ºå®æ§ãå¢å€§ããåŠç¿çãæå¶ããããšã§éãè°·ã§ã®çºæ£ãé²ãã
|
| 116 |
+
â» emoPulse ã¯ããŠãŒã¶ãŒå®çŸ©ã®åæåŠç¿ç(emoScope)ãšã·ã¹ãã ã®ããã©ã«ãæåºŠ(η_base)ã«ãã£ãŠæ±ºå®ãããã¹ã±ãŒãªã³ã°ä¿æ°ã§ããã
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
4. emoPulseïŒRegret Bound ãšæçæ§ã®è§£æ
|
| 120 |
+
|
| 121 |
+
4.1 åææ§ãš Regret è§£æ
|
| 122 |
+
|
| 123 |
+
emoPulse äžã«ãããçŽ¯ç© Regret R(T) ã¯ãåçã«å€åããåŠç¿ç η_t ãå«ãã åœ¢ã§æ¬¡ã®ããã«äžçãäžããããã
|
| 124 |
+
|
| 125 |
+
R(T) <= O( Σ_{t=1}^T [ η_t * ||g_t||^2 * (1 - |Ï_t|)^2 ] )
|
| 126 |
+
|
| 127 |
+
ããã§ãä¿æ° (1 - |Ï_t|) ã¯ãæå€±é¢æ°ã®çæã»äžæã»é·æ EMA ã®æŽåæ§ããå°åºãããæŽæ°ã®ïœ¢ä¿¡é ŒåºŠ (Trust)ãå®éåãããã®ã§ããã |Ï_t| ã倧ããç¶æ
ã¯æå€±ãæ¿ããå€åããŠããããšã瀺ããåœè©²ã¹ãããã®åŸé
æ
å ±ã®ä¿¡é Œæ§ãäœããšå€å®ãããã
|
| 128 |
+
察ç
§çã«ã|Ï_t| ãå°ããç¶æ
ã¯æå€±ã®æšç§»ãå¹³æ»ã§ãããæŽæ°æ¹åã®ä¿¡é Œæ§ãé«ãããšãæå³ããã ãããã£ãŠãä¿¡å·åŒ·åºŠãšããŠã® trust_t = 1 - |Ï_t| ã¯ãRegret Bound ã«ãããæå¹ãªæŽæ°éãé©å¿çã«éã¿ä»ãããäžç¢ºå®ãªåŸé
ã«ãã Regret ã®çޝç©ãæå¶ãã圹å²ãæããã
|
| 129 |
+
|
| 130 |
+
æ¬ææ³ã® emoPulse ã¯ãDefazio & Mishchenko (2023) ã«ãã D-adaptation ã®åŠç¿çæ§é ããLoss ã®æç³»åçµ±èšé (d_t, N_t) ã«ãã£ãŠè¿äŒŒããäžè¬åã§ããã
|
| 131 |
+
|
| 132 |
+
η_t â D^2 / noise
|
| 133 |
+
|
| 134 |
+
emoPulse ã®å®çŸ©
|
| 135 |
+
|
| 136 |
+
η_t = ( d_t / (N_t + ε) )^2 * η_base
|
| 137 |
+
|
| 138 |
+
ããã¯ãD-adaptation ã® è·é¢ / ãã€ãºæ¯ ã«åºã¥ã SNR å¶åŸ¡ããã®ãŸãŸæç³»åçã«åæ§æãããã®ã§ããã
|
| 139 |
+
|
| 140 |
+
ãã®æ§é ã«ããããã€ãºæå N_t ãå¢å€§ããéã«ã¯åæ¯ãæ¯é
çãšãªããåŠç¿ç η_t ã¯å³åº§ã«çž®å°ããã ãã®èªå·±èª¿æŽæ©èœã«ãããæå€±å°åœ¢ãäžå®å®ãªé åã§ã®éå°ãªæŽæ°ãèªåçã«æå¶ãããã ããã¯ãå€éšããã®åŠç¿çã¹ã±ãžã¥ãŒãªã³ã°ãå¿
èŠãšãããšããã¢ã«ãŽãªãºã ãåçãªå®å®æ§ãèªåŸçã«ç²åŸããLearning-rate-freeãªç¹æ§ãçè«çã«æ
ä¿ããŠããã
|
| 141 |
+
|
| 142 |
+
4.2 æ£å®å€æ§ãšæçæ§ã®èšŒæ
|
| 143 |
+
|
| 144 |
+
æ¬ã¢ã«ãŽãªãºã ãä»»æã®ã¹ããã t ã«ãããŠãåŠç¿çã®ççºããã³æ¶æ»
ãé²ããæçã§ããããšã以äžã«èšŒæããã
|
| 145 |
+
|
| 146 |
+
1. 忝 (ç¬éçç念ïŒnoise_base) ã®éãŒãæçæ§
|
| 147 |
+
|
| 148 |
+
emoPulse çææã®åæ¯ãšãªã noise_base ã¯ãçŸåšã®ææ
ã¹ã«ã©ãŒ sigma_t ãšä¿¡é ŒåºŠ trust_t ã®ä¹é¢ãšããŠä»¥äžã®ããã«å®çŸ©ãããã
|
| 149 |
+
|
| 150 |
+
noise_base = abs(sigma_t - trust_t) + ε_s
|
| 151 |
+
|
| 152 |
+
å®è£
ã«ãã㊠|sigma_t| < 1.0 ã〠trust_t ã sigma_t ã«åºã¥ã笊å·ä»é¢æ°ã§ããããšããããã®å·®åã¯æçã§ããã ããã«æ«å°Ÿã®å®å
šä¿æ° (+ 0.1) ã«ããã忝ããŒãã«æŒžè¿ããããšã«ããåŠç¿çã®ççº (NaN) ãç©ççã«åé¿ããŠããã
|
| 153 |
+
|
| 154 |
+
2. åå (æéç確信ïŒd_base) ã®äžéæçæ§
|
| 155 |
+
|
| 156 |
+
emoPulse çææã®ååãšãªã d_base ã¯ãå±¥æŽãšããŠã®ãã€ãºæšå®å€ N_t (noise_est) ãšè·é¢æšå®å€ d_t (d_est) ã®å·®ãšããŠå®çŸ©ãããã
|
| 157 |
+
|
| 158 |
+
d_base = abs(N_t - d_t) + ε_t
|
| 159 |
+
|
| 160 |
+
N_t 㯠max(noise_est, Μ_r) ã«ãã£ãŠæ£å®å€æ§ãä¿èšŒãããŠããããŸã d_t ã¯æ¹åã»æªåãåãã abs(trust_t) ã®ç©ç®ã§æŽæ°ãããã ãããæéçãªçµ±èšéã®å·®ã«å®å
šä¿æ° (+ 0.1) ãå ããããšã§ïœ¢æ¥µäœç²ŸåºŠç°å¢ã«ãããŠå±¥æŽãäžå®å®ãªå Žåã§ããåžžã«æå°éã®æ©å¹
(ååã®äžéå€) ã確ä¿ãããããšãæ°åŠçã«æ
ä¿ãããã
|
| 161 |
+
|
| 162 |
+
3. æçæ§ã®çµè«ãš emoPulse ã®ææ
|
| 163 |
+
|
| 164 |
+
以äžã®ïœ¢ç¬éçåºç€ïœ£(忝)ãšïœ¢æéçåºç€ïœ£(åå)ã®æ¯çããçæãããæå¹åŠç¿ç emoPulse_t ã¯ãæçµçã«å®è£
äžã® max(min(..., 3e-3), 1e-6) ãšããå®å
šåã®èšå®ã«åºã¥ãã以äžã®ç¯å²ã«å³æ Œã«ææãããã
|
| 165 |
+
|
| 166 |
+
0 < η_min <= emoPulse_t <= η_upper_bound
|
| 167 |
+
|
| 168 |
+
ããã§äžéå€ (η_min) ã¯ãã·ã¹ãã ãæãäžç¢ºå®ãªç¶æ
ã«ãããŠãç¶æãããæå°ã®ïœ¢ä»£è¬é(å¿æ) ã§ãããããã«ããåŠç¿åæ¢ (ãããããã¯) ãåé¿ããèªåŸçãªå埩ãåŸ
ã€ããšãå¯èœãšãªãã äžæ¹ãäžéå€ (η_upper_bound) ã¯ãdNR ä¿æ°ã®æ¥æ¿ãªå¢å€§ãçºçããå Žåã§ãã¢ãã«ã®çºæ£ãé²ããªããã¿ãŒãšããŠæ©èœããã
|
| 169 |
+
|
| 170 |
+
å®è£
äžã®çæç¹ïŒ
|
| 171 |
+
åæå€èšå®ã«ããå®å®åïŒ
|
| 172 |
+
â» ããŒã¿ã»ãããéåžžã«å°ããç°å¢ãåæãã€ãºã倧ããç°å¢ã§ã¯ããã«ã EMA ã履æŽïœ£ãå®å®ããããŸã§ã®éãd_t ãš N_t ã®åæå€ãåèšå®ããããšãæšå¥šãã (äŸïŒd-estïŒ0.2, Noise-estïŒ0.2) ããã«ãããåæã®ç¢ºççãã€ãºã«ããçºæ£ãæå¶ã§ããã ç¹ã«ãN_0 ã d_0 ãšåçã«åæåããããšã§ãã·ã¹ãã ã¯æ¬è³ªçã«ïœ¢æ
éã¢ãŒãããéå§ãããã ããã¯ãåæã®éèŠãªã¹ãããã«ãããŠãéåºŠã«æ»æçãªæŽæ°ãé¿ããå°åœ¢ã®èгå¯ãåªå
ããææ©çãªãŠã©ãŒã ã¢ããã»ãã§ãŒãºãšããŠæ©èœããã
|
| 173 |
+
åæå€èšå®ã«ããïœ¢æŽæ°å§åã®ç¶æãšå®å
šæ§ã®äž¡ç«ïŒ
|
| 174 |
+
â» æ¬ææ³ã«ãã㊠emoPulse ã®ååã圢æãã d_base ã¯ãã·ã¹ãã ã®ïœ¢æœåšçãªæŽæ°åïœ£ãæ±ºå®ãããããã§åæå€ã N0 = 1.0, d0 = 0.02 ãšèšå®ããããšã¯ãåŠç¿åæããé«ãå éããã³ã·ã£ã«ãæå³çã«ç¢ºä¿ããŠããããšãæå³ããã ãã®åæå€ã®åœ±é¿ã¯ãææ°ç§»åå¹³åã®ç¹æ§äžãçŽ100ã¹ãããã«ããã£ãŠïœ¢å±¥æŽïœ£ãšããŠæ®çããã ãã®æéã·ã¹ãã ã¯é«ãå éå§åãèæ¯ã«æã¡ã€ã€ããææ
æ©æ§ã«ãã峿 Œãªéžå¥ãã¯ãªã¢ããçã«ä¿¡é Œã§ããä¿¡å·ïœ£ã«å¯ŸããŠã®ã¿åæåãæäŸããã
|
| 175 |
+
|
| 176 |
+
|
| 177 |
+
5. 笊å·åæ£èŠåïŒäœç²ŸåºŠç°å¢ãžã®é©å¿
|
| 178 |
+
|
| 179 |
+
æ¬ç« ã§ã¯ãemoPulse ã®çè«çæ çµã¿ãäœç²ŸåºŠç°å¢ã«é©çšããããã®ç¬Šå·åæ£èŠå (sign-based normalization) ã«ã€ããŠè¿°ã¹ãã
|
| 180 |
+
|
| 181 |
+
ç²Ÿç·»ãªæµ®åå°æ°ç¹èšç®ãžã®äŸåãæããæ¥µäœç²ŸåºŠç°å¢ (è¶
éåå) ã«å¯Ÿå¿ããããã以äžã®æŽæ°åãæ¡çšãã (EmoAiry, EmoCats, ç)
|
| 182 |
+
|
| 183 |
+
delta_w_t = -emoPulse_t * sign( m_t / ( sqrt(v_t) + ε ) )
|
| 184 |
+
|
| 185 |
+
ããã«ããã EmoAiry ã§ã¯ãïŒæ¬¡å
ãã¯ãã«ãšïŒæ¬¡ã¢ãŒã¡ã³ãã®ç²ŸåºŠã®ã¢ã³ãã©ã³ã¹ãè§£æ¶ããæ¹åæ§ã®åæã®ã¿ãæœåºããæå¿ã®çµ±äžïœ£ãå®çŸããŠããã
|
| 186 |
+
â» EmoCats ã¯ãLionããŒã¹ã« WDåé¢ããã笊å·åã§å¯Ÿå¿ããŠãã
|
| 187 |
+
â» EmoTion / EmoVoid ã¯ãç¬èªæŽæ°åŒïœ¢å¹ŸäœåŠççŽäº€æŽæ°ïœ£ã笊å·åããŠãã
|
| 188 |
+
|
| 189 |
+
|
| 190 |
+
6. EmoTionã EmoVoid ã«ãã"æ°ããæé©å"ã®æŽæ°åŒã®è§£èª¬ãšæªæ¥ãžã®æ©æž¡ã
|
| 191 |
+
|
| 192 |
+
æ¢åææ³ãžã®æ¬æãšãEmoTion / EmoVoid ã®ç«ã¡äœçœ®ïŒ
|
| 193 |
+
EmoTion / EmoVoid ã®æŽæ°ã¢ã«ãŽãªãºã ã¯ãçŸä»£ã®ãã£ãŒãã©ãŒãã³ã°ã®éåå¡ã§ãã Adamç ãžã®æ·±ãæ¬æããåºçºããŠããã Adamç ã®ç€ºããé©å¿çåŠç¿çãšããæŠå¿µã¯æé©åã宿œã§ããæ¡ä»¶ãæŽãæ®åãžã®ããŒãã«ã倧ããäžããã
|
| 194 |
+
|
| 195 |
+
EmoTion / EmoVoid ã¯ãã®ç²Ÿç¥ãç¶æ¿ãã€ã€ãç°ãªãã¢ãããŒããšããŠïœ¢çµ±èšã®ä»£ããã«ã幟äœåŠ(W-Ref Geometry)ãšææ
(emoPulse)ãçšããã
|
| 196 |
+
|
| 197 |
+
æ£ç¢ºãã®æ°ãã圢ïŒ
|
| 198 |
+
Adamçãéå»ã®çµ±èšïœ£ããç·»å¯ã«éãåãæãã®ã«å¯ŸããEmoTion / EmoVoid ã¯ïœ¢çŸåšã®éã¿ãšã®å¯Ÿè©±ïœ£ãšïœ¢Lossã®éŒåãéããŠãããããªããã«å°åœ¢ãæ©ãã ããã«ãããAdamç ãšäžŠã³ç«ã€æ£ç¢ºããç¶æããªãããéåŠç¿ãæããèªç¶ãªåæïœ£ãç®æããã
|
| 199 |
+
|
| 200 |
+
ãªãœãŒã¹ãžã®åªãã(VRAMåæž)ïŒ
|
| 201 |
+
èšç®è³æºã¯æéã§ããã誰ãã髿§èœã§æœ€æ²¢ãªãªãœãŒã¹ã䜿ããããã§ã¯ãªãã EmoTion 㯠Adamç ã倧åã«ä¿æããŠããïŒæ¬¡ã¢ãŒã¡ã³ããšããæ£ç¢ºãªä»çµã¿ãã¹ã«ã©ãŒå¶åŸ¡ïœ£ã«å§ããããšã§ãVRAM è² è·ãçŽååã«æããããšãã§ããã EmoVoid ã¯ãïŒæ¬¡ïœ¥ïŒæ¬¡ã¢ãŒã¡ã³ããã©ã¡ããæãããWãGãã®çŽäº€æ§ããã€ã¬ã¯ãã«åæ ãããããšã§ãVRAMè² è·ã極éãŸã§æããããšãã§ããã ããã¯ãããå€ãã®äººãAIåŠç¿ã宿œã§ããæ°äž»çãªåŠç¿ç°å¢ïœ£ã®åºç€ã«ãªããšèããã
|
| 202 |
+
|
| 203 |
+
W-Ref Geometry ã«ãã幟äœåŠçæ
£æ§å¶åŸ¡ïŒ
|
| 204 |
+
äž¡ã¢ã«ãŽãªãºã ã®æ žå¿ã¯ãéã¿ãã¯ãã« W ãšåŸé
ãã¯ãã« G ã®çŽäº€æ§(Orthogonality)ã«åºã¥ã幟äœåŠçæŽæ°åã«ããã åŸæ¥ã®çµ±èšçææ³ãéå»ã®åŸé
ã®èç©(圱)ã«äŸåããã®ã«å¯ŸããW-Ref Geometry ã¯çŸåšã®éã¿ W ãšããå®äœïœ£ãåºæºãšããåŸé
G ã®æ°é®®åºŠ(Freshness)ã以äžã®äœåŒŠé¡äŒŒåºŠ Ï(rho)ããå°åºããã
|
| 205 |
+
|
| 206 |
+
Ï(rho) = | <W, G> | / ( ||W|| * ||G|| + eps )
|
| 207 |
+
|
| 208 |
+
Ï (rho)ãå°ãã(çŽäº€ã«è¿ã)ã»ã©ãçŸåšã®åŸé
ã¯æ¢åã®éã¿æ§é ã«å«ãŸããªãæªç¥ã®æ
å ±ïœ£ãæã£ãŠãããšå€æããæ
£æ§ãæããŠçŸæç¹ã®åŸé
ã匷ãåã蟌ãã ãã®å¹ŸäœåŠçãªïœ¢æ
å ±ã®éžå¥ïœ£ã«ãããçµ±èšçé
å»¶ã®ãªãé«ç²ŸåºŠãªæ¹å転æãšãåé·ãªæŽæ°ã®æå¶ã«ããæ£åå广ãåæã«éæããŠããã
|
| 209 |
+
|
| 210 |
+
EmoTion ïŒæ¬¡ã¢ãŒã¡ã³ãã®ã¿ã§æç«ããçç±ïŒ
|
| 211 |
+
EmoTion ã ïŒæ¬¡ã¢ãŒã¡ã³ã(忣æšå®)ãæããªãã®ã¯åãªã軜éåã§ã¯ãªãã W-Ref Geometry ã«ãããåŸé
ã®ïœ¢å€§ããã§ã¯ãªãæ¹åã®æ°é®®ããåºæºã«æŽæ°ãè¡ããããïŒæ¬¡ã¢ãŒã¡ã³ããæ
ã圹å²ã®å€ããäžèŠã«ãªãã W-Ref Geometry ã«ããæ¹åã®éžå¥ã¯ãåŸé
G ã éã¿ W ãšçŽäº€ã«è¿ãã»ã©ãæªç¥ã®æ
å ±ãå«ããšå€æããæ
£æ§ã匱ããŠæ°ããæ¹åãžèµãåãã éã«ãW ãšå¹³è¡ãªåŸé
ã¯åé·ãšã¿ãªããæ
£æ§ãåªå
ããã ãã®ïœ¢æ¹åã®çŽåºŠïœ£ã«åºã¥ãéžå¥ã¯ã忣æšå®ãããçŽæ¥çã§ããã€ãºã«åŒ·ããéåŠç¿ãæãã广ãæã€ã
|
| 212 |
+
â» EmoVoid ã¯ã1次2次ã¢ãŒã¡ã³ããªãã§ã
|
| 213 |
+
|
| 214 |
+
|
| 215 |
+
以äžã詳现ãªèª¬æãããã W-Ref Geometry æ³ ã®è©³çް
|
| 216 |
+
|
| 217 |
+
1. 幟äœåŠçææš Ï (Orthogonality Index) ã®å®çŸ©
|
| 218 |
+
åŸæ¥ã®æé©ååšãåŸé
ã®å€§ãã(L2 norm)ã統èšçåæ£ïœ£(ïŒæ¬¡ã¢ãŒã¡ã³ã)ã§åŠç¿çã調æŽããã®ã«å¯ŸããEmoTion 㯠çŸåšã®éã¿ãã¯ãã« W ã«å¯ŸããåŸé
ãã¯ãã« G ã®çžå¯Ÿçãªåããæ
å ±ã®é®®åºŠãšããŠå®çŸ©ããã
|
| 219 |
+
|
| 220 |
+
Ït(rho_t) = | <W_t, G_t> | / ( ||W_t|| * ||G_t|| + eps )
|
| 221 |
+
|
| 222 |
+
çŽäº€ç¶æ
(Ïâ0)ïŒ åŸé
ãçŸåšã®éã¿æ§é ãšçŽäº€ããŠããã ããã¯ïœ¢çŸåšã®ã¢ãã«ããŸã æã£ãŠããªããå
šãæ°ããç¥èæ¹åã§ããããšã瀺åããã
|
| 223 |
+
å¹³è¡ç¶æ
(Ïâ1)ïŒ åŸé
ãçŸåšã®éã¿ãšåãæ¹å(ãŸãã¯çé)ãåããŠããã ããã¯ïœ¢çŸåšã®éã¿ã®ã¹ã±ãŒã«èª¿æŽã«éããªããåé·ãªæ
å ±ïœ£ã§ããå¯èœæ§ã瀺åããã
|
| 224 |
+
|
| 225 |
+
2. é©å¿çæ
£æ§å¶åŸ¡ (Geometric Momentum Blending)
|
| 226 |
+
ãã®æŽæ°åŒã¯ãåŸé
ã®"æ°é®®åºŠ"ã«å¿ããŠæ
£æ§ãåçã«èª¿æŽããä»çµã¿ã§ããã åŸæ¥ã®ïŒæ¬¡ã¢ãŒã¡ã³ãã«ãã忣æšå®ãã幟äœåŠçãªæ
å ±ã®éè€åºŠã«çœ®ãæããæ§é ã§ããã
|
| 227 |
+
|
| 228 |
+
m_t = beta1 * m_{t-1} + (1 - beta1) * Freshness_t * G_t
|
| 229 |
+
where Freshness_t = 1.0 - EMA(rho_t)
|
| 230 |
+
|
| 231 |
+
çè«çè§£éïŒ åŸé
ãçŽäº€ïœ£(æ°é®®)ã®ãšããæ
£æ§(éå»ã®åœ±)ãäžæçã«åŒ±ããæ°ããæ
å ±ãžå³åº§ã«åå¿(èµãåã)ããã éã«ïœ¢å¹³è¡ïœ£(åé·)ãªãšããæ
£æ§ãç¶æããŠå®å®æ§ãåªå
ããã ããã¯ïœ¢çµ±èšçãªäžç¢ºå®æ§ïœ£(忣)ã幟äœåŠçãªæ
å ±ã®éè€åºŠïœ£ã«çœ®ãæããŠè§£éããŠãããšãããã
|
| 232 |
+
|
| 233 |
+
â» EmoVoid ã«ãããç°¡ç¥åïŒ EmoVoid ã¯ããã®æ
£æ§å¶åŸ¡ãããæé€ããFreshness(鮮床)ãçŽæ¥æŽæ°ãã¯ãã«ã«ä¹ç®ããã ããã«ãããã¡ã¢ãªäžã® m_t ã¹ããããå®å
šã«éæŸããªããã幟äœåŠçãªæ
å ±ã®éžå¥ãå®çŸããŠããã
|
| 234 |
+
|
| 235 |
+
3. æŽæ°åŒã®ç¬Šå·åãš L2 æ£èŠåã®ä»£æ¿
|
| 236 |
+
EmoTion ããã³ EmoVoid ããïŒæ¬¡ã¢ãŒã¡ã³ãã»ããªãŒ(ãããã¯å®å
šã¢ãŒã¡ã³ãã»ããªãŒ)ã§ããããæåŸã®éµã¯ãç¬Šå·æœåº (Sign) ãš Weight Decay ã®åé¢ã«ãããæŽæ°æ¹åã sign(m_t) ã ãã§æ±ºããããšã§ãéã¿ã®æŽæ°å¹
ãåŸé
ã®"倧ãã"ã«å·Šå³ãããªããªãã ããã«ããåŸé
ã¹ã±ãŒã«ã®æºããããã€ãºã«åŒ·ããå®å®ããæŽæ°ãå¯èœã«ãªãã
|
| 237 |
+
|
| 238 |
+
EmoTion ã®æŽæ°åŒïŒ
|
| 239 |
+
W_{t+1} = W_t * (1 - emoPulse_t * lambda) - emoPulse_t * sign(m_t)
|
| 240 |
+
( emoPulse 㯠dNRããå°åºããåŠç¿çãlambda 㯠WeightDecay ä¿æ° )
|
| 241 |
+
|
| 242 |
+
EmoVoid ã®ï¿œï¿œï¿œæ°åŒïŒ
|
| 243 |
+
W_{t+1} = W_t â emoPulse_t * sign(G_t) * (1âÏ_t)
|
| 244 |
+
( EmoVoid 㯠èªå·±æå¶æ©èœã«ãããæç€ºç㪠lambda ãçšãããšãå®å®çãªåæãå¯èœã§ãã )
|
| 245 |
+
|
| 246 |
+
⻠å®äœåç
§åæé©åïœ£ã®æå±ïŒ åŸæ¥ã®æé©åã éå»ã®åŸé
(å±¥æŽ)ã远ããããææ³ã§ããã®ã«å¯Ÿããæ¬ææ³ã¯ ïœ¢çŸåšã®éã¿ïœ£(å®äœ)ãšã®çžé¢ãæŽæ°ã®ããªã¬ãŒã«ããææ³ã Weight-Reference æ³ (W-Ref æ³)ã確ç«ããã
|
| 247 |
+
⻠次å
ã®åªããžã®å¹ŸäœåŠçè§£éïŒ é«æ¬¡å
空éã«ããããã¯ãã«ã®éäžçŸè±¡(äºãã«çŽäº€ããããæ§è³ª)ãå©çšããçŽäº€ããã®å
ããªïœ¢ãºã¬ïœ£ãæ
å ±ã®éè€(åé·æ§)ãšããŠæ€ç¥ããã ããã«ãããçµ±èšçãªåæ£æšå®ã«é Œãããšããããé«ç²ŸåºŠãã€äœé
å»¶ãªæ
£æ§å¶åŸ¡ãå®çŸããã 髿¬¡å
空é(æ°åãã©ã¡ãŒã¿ã®å±€ãªã©)ã§ã¯ãäºã€ã®ãã¯ãã«ãå¶ç¶ã«å¹³è¡ã«ãªã確çã¯æ¥µããŠäœããã»ãŒå
šãŠã®ãã¯ãã«ã¯çŽäº€ãããã Ï ã 0 ããå°ãã§ãé¢ãã(å¹³è¡ã«è¿ã¥ã)ããšã¯ãçµ±èšç㫠極ããŠåŒ·ãçžé¢ïœ£(éè€)ãæå³ããããšã«ãªãã ã€ãŸããéå»ã®èšå€§ãªçµ±èš(ïŒæ¬¡ã¢ãŒã¡ã³ã)ãåç
§ããã«ãçŸåšã®éã¿ãšã®é¢ä¿æ§ã ãã§ïœ¢ãã®æŽæ°ã«äŸ¡å€ãããããå³åº§ã«å€å¥å¯èœãšãªãã
|
| 248 |
+
â» emoPulse ãšã®å
±é³ŽïŒ emoPulse ãæé軞ã®éŒå(ãã€ã©ã®ãããåãã)ãå¶åŸ¡ããW-Ref Geometry ã空éè»žã®æ¹å(ã©ããžã©ããããåãã)ãæ±ºããã ãã®æé空éã®çµ±åçèªåŸå¶åŸ¡ã¯ãVRAM åæžãšé«ç²ŸåºŠãªåæãäž¡ç«ãããæ žå¿ã§ãããããã¯åŠç¿ã®é 奿§ãåäžãããã
|
| 249 |
+
|
| 250 |
+
4. W-Ref Geometry ã®è¿äŒŒå(Approx W-Ref Geometry)ã«ããå®è£
ç軜éå
|
| 251 |
+
|
| 252 |
+
çè«çã« W-Ref Geometry ã¯ä»¥äžã®ããã«éã¿ãšåŸé
ã®çŽäº€æ§ãå³å¯ã«æž¬å®ããã
|
| 253 |
+
Ït(rho_t) = | <W_t, G_t> | / ( ||W_t|| * ||G_t|| + eps )
|
| 254 |
+
|
| 255 |
+
ãããã巚倧ã¢ãã«ã§ã¯ãå
šå±€ã®å
ç©ãå
šå±€ã®ãã«ã ãcos é¡äŒŒåºŠããããã®é次èšç®ã VRAM ãšèšç®è² è·ã®ããã«ããã¯ã«ãªãã ããã§å®è£
ã§ã¯ãW-Ref Geometry ã®è¿äŒŒåŒãå°å
¥ããã ããã¯ãWâRef Geometry ã®"æ¬è³ª"ãä¿ã¡ãªãããVRAM 䜿çšéãã»ãŒãŒãã«ããŠããã
|
| 256 |
+
|
| 257 |
+
4-1. EmoTionïŒL1 ãã«ã å€åéã«ããæ¹åã®æ°é®®ãæšå®
|
| 258 |
+
|
| 259 |
+
EmoTion ã¯ãéã¿å
šäœã® L1 ãã«ã ã®å€åéããã¢ãã«ãã©ãã ãæ°ããæ¹åãžåãããšããŠãããïœ£ãæšå®ããã
|
| 260 |
+
g_ratio_t = | L1_t - L1_{t-1} | / ( L1_{t-1} + eps )
|
| 261 |
+
Freshness_t = min( g_ratio_t / freshness_scale , freshness_cap )
|
| 262 |
+
|
| 263 |
+
ãã® Freshness_t ãã1次ã¢ãŒã¡ã³ã(exp_avg)ãžã®æ··åæ¯çãšããŠäœ¿çšãçŽäº€æ¹åã«ã¯åŒ·ãåå¿ããå¹³è¡æ¹åã«ã¯æ
£æ§ãæ®ããšãã WâRef Geometry ã®å³å¯ãªæž¬å®ææ³ã軜éã«å®çŸããŠããã
|
| 264 |
+
|
| 265 |
+
4-2. EmoVoidïŒéã¿ãšãã«ã®ãŒã®"çŽæ¥ã¹ã±ãŒãªã³ã°"ã«ããè¿äŒŒ
|
| 266 |
+
|
| 267 |
+
EmoVoid ã¯ãïŒæ¬¡ïŒæ¬¡ã®äž¡æ¹ã®ã¢ãŒã¡ã³ããæããªããããfreshness ã®ãããªæ
£æ§å¶åŸ¡ãè¡ããªãã
|
| 268 |
+
g_ratio_t = L1_{t-1} / ( L1_t + eps )
|
| 269 |
+
W_t â W_t * g_ratio_t
|
| 270 |
+
|
| 271 |
+
ãã®ä»£ããã«éã¿å
šäœã® L1 ãã«ã ãçŽæ¥ã¹ã±ãŒãªã³ã°ããããšã§ WâRef Geometry ã®ïœ¢æ¹åã®çŽåºŠïœ£ãè¿äŒŒçã«ç¶æããã EmoVoid ã®ã¹ã±ãŒãªã³ã°ã¯"ãŠã©ãŒã ã¢ããæéãšæçµç€ã®å®å®æ"ã®ã¿è¡ããããã®ä»ã§ã¯ã¹ã±ãŒãªã³ã°ããã sign(G_t) ã®ã¿ã§æŽæ°ããã ããã«ãããéã¿ã®ãšãã«ã®ãŒãæŽèµ°ããªããåŸé
æ¹åã®åããæå¶ããããã¢ãŒã¡ã³ããªãã§ãå®å®ããåæãå¯èœã«ãªãããšãã EmoVoid ç¬èªã®"幟äœåŠçèªå·±æå¶" ãæç«ããã
|
| 272 |
+
|
| 273 |
+
4-3. è¿äŒŒåŒã®æçŸ©ïŒè¿äŒŒçã¯ïœ¢çè«ã®å®å
šçã§ã¯ãªãå®è£
äžã®æé©åãšããŠèšèš
|
| 274 |
+
|
| 275 |
+
äž¡è
ã¯ïœ¢æé軞(emoPulse)ãšïœ¢ç©ºé軞(WâRef Geometry)ãã©ãæ±ãããšããç¹ã§ç°ãªãããæçµçã«ã¯ã©ã¡ãããçµ±èšã«é Œããªã幟äœåŠçæé©åããå®çŸããŠããã EmoTion 㯠Freshness ã«ããæ
£æ§å¶åŸ¡ããEmoVoid ã¯ãšãã«ã®ãŒè£æ£ã«ããèªå·±æå¶ãçšããããã©ã¡ãã WâRef Geometry ã®æ žå¿ã§ããæ¹åã®çŽåºŠã®è©äŸ¡ïœ£ãå
±æããŠããã
|
| 276 |
+
|
| 277 |
+
5. èšç®ãã¬ãŒã ã¯ãŒã¯ (PyTorchç) ãžã®èŠæ
|
| 278 |
+
|
| 279 |
+
æ¬çš¿ã§ææ¡ãã W-Ref Geometry ããã³ Approx W-Ref ã¯ãçŸåšã®æ·±å±€åŠç¿ãã¬ãŒã ã¯ãŒã¯ã«ãããã¡ã¢ãªå¹çã®éçãçªç Žããå¯èœæ§ãç§ããŠããã ããã§å°æ¥ç㪠PyTorch çã®ãã³ãœã«æŒç®ã©ã€ãã©ãªã«å¯Ÿãã以äžã®æ©èœå®è£
ã匷ãèŠæãããã
|
| 280 |
+
|
| 281 |
+
èŠæïŒéã¿ãšåŸé
ã®å¹ŸäœåŠççžé¢é¢æ° torch.geom_relation(W, G) ã®ãã€ãã£ãå®è£
|
| 282 |
+
|
| 283 |
+
çŸåšãéã¿ W ãšåŸé
G ã®çŽäº€æ§(Ï)ãç®åºããã«ã¯ãå
ç©èšç®ãããããã®ãã«ã èšç®ãããã³ããããä¿æããããã®äžéãã³ãœã«ãå¿
èŠãšãªãããããç¡èŠã§ããªãèšç®ãªãŒããŒããããš VRAM å§è¿«ãæããŠããã
|
| 284 |
+
|
| 285 |
+
ãããC++/CUDA ã¬ãã«ã§ W ãš G ãçŽæ¥åç
§ããäžéãã³ãœã«ãçæããã«ã
|
| 286 |
+
|
| 287 |
+
Ït(rho_t) = | <W_t, G_t> | / ( ||W_t|| * ||G_t|| + eps )
|
| 288 |
+
(åå¥ãã©ã¡ãŒã¿å±€ããšã®çŽäº€åºŠ)
|
| 289 |
+
|
| 290 |
+
ãããã¹ã«ã©å€ãšããŠè¿ããã€ãã£ã颿°ãå®è£
ãããã°ã2次ã¢ãŒã¡ã³ã(åæ£çµ±èš)ãä¿æããããšãªãã幟äœåŠçãªç¢ºä¿¡åºŠã«åºã¥ããæŽæ°ãæå°éã® VRAM ã§å¯èœãšãªãã ããã¯åã«ãæé©åã®é«éåã«çãŸããããšããžããã€ã¹ãéãããè³æºç°å¢ã«ãããïœ¢å€§èŠæš¡ã¢ãã«åŠç¿ã®æ°äž»åïœ£ãæ±ºå®ã¥ããã©ã¹ãããŒã¹ã«ãªããšç¢ºä¿¡ããã
|
| 291 |
+
|
| 292 |
+
|
| 293 |
+
7. Flow-Matchingç³»ãšã®çè«çæ¥ç¶ãšæ§é çéç
|
| 294 |
+
EmoSens äžä»£ (Sens / Airy / Cats / Tion / Void) ã¯ãFlow-Matching(FM) ç³»ææ³ã«å¯ŸããŠä»¥äžã®ïŒã€ã®æå³ãæã€ã
|
| 295 |
+
ïŒïŒæ¬ææ³ã¯ Flow-Matching ã®æŽæ°æ§é ã«äžçã§åããŠå®å
šé©å¿ããæé©ååšã§ããã
|
| 296 |
+
ïŒïŒåæã« Flow-Matching ç³»ã®æ§é çéçãããã®å
ãæç€ºããååšã§ãããã
|
| 297 |
+
|
| 298 |
+
1. Flow-Matching ãæ±ãããã€ãºé蚱容æ§ïœ£ãšããæ§é çå¶çŽ
|
| 299 |
+
|
| 300 |
+
Flow-Matching ã¯ãé£ç¶æéã®æµãå Žãå¿ å®ã«åçŸãããããåŸé
å Žã®æ»ããããšæŽåæ§ã匷ãèŠæ±ããã ãããããã®èšèšã¯ ãã€ãºãæ¬è³ªçã«èš±å®¹ã§ããªã ãšããæ§é çå¶çŽãå
å
ããŠããã
|
| 301 |
+
- åŸé
ã®åŸ®çްãªä¹±ãããã®ãŸãŸæµãå Žã®ç Žç¶»ã«ã€ãªãã
|
| 302 |
+
- éååäœç²ŸåºŠç°å¢ã§ã¯åŸé
ã®ä¿¡é Œæ§ãæ¥æ¿ã«äœäžãã
|
| 303 |
+
- ãã€ãºãå容ããç·©è¡æ§é ãååšããªãããæ±åæ§ãæãªããã
|
| 304 |
+
å®éãFMç³»ã®åŠç¿ã§ã¯ SNR ã®äœäžããã®ãŸãŸçºæ£ïœ¥ç Žç¶»ãžçŽçµããããšãç¥ãããŠããã ããã¯åŸè¿°ãã SDXL / VAE / ããã©åæåã®å®éšçµæãšãæŽåããã
|
| 305 |
+
|
| 306 |
+
2. emoPulse ã«ãããã€ãºã®å容ãšå©çšïœ£ãšããéèšèš
|
| 307 |
+
|
| 308 |
+
emoPulse 㯠Loss ã®æç³»åçµ±èšéã䞻軞ãšããããããã€ãºãæé€ãã¹ã誀差ã§ã¯ãªãåŠç¿ã®é²è¡ã瀺ãä¿¡å·ãšããŠæ±ãã
|
| 309 |
+
- Multi-EMA ã«ãã髿¬¡ã¢ãŒã¡ã³ãè¿äŒŒã¯ãã€ãºãå«ãæºãããç©æ¥µçã«å©çšãã
|
| 310 |
+
- trust_t ã¯ãã€ãºã®ååšãåæãšãã確信床ã®å®çŸ©ã§ãã
|
| 311 |
+
- emoPulse 㯠SNR ã®åçæšå®ã«ãããã€ãºãåŠç¿çå¶åŸ¡ã®æºæ³ã«å€æãã
|
| 312 |
+
ãã®æ§é ã«ãããemoç³» ã¯ïœ¢ãã€ãºã蚱容ããªããæ±åæ§ãç²åŸãããšãããFlow-Matching ãšã¯éã®èšèšææ³ãæã€ã
|
| 313 |
+
|
| 314 |
+
3. Flow-Matching ãžã®ïœ¢å®å
šé©å¿ïœ£ããã®éçãæµ®ã圫ãã«ãããšããé説
|
| 315 |
+
|
| 316 |
+
emoç³»æé©ååšã¯ Flow-Matching ã®æŽæ°æ§é ã«å®å
šé©å¿ããããšã§ãFMç³»ã®æ¬è³ªçãªåŒ±ç¹ãæãé®®æã«æµ®ãã³äžããããã
|
| 317 |
+
- FM ã®èŠæ±ããæ»ãããªåŸé
å Žã¯å®éã®åŠç¿éçšã§ã¯æç«ãã¥ãã
|
| 318 |
+
- ãã€ãºé蚱容æ§ã¯äœç²ŸåºŠã»éååç°å¢ã§ã¯èŽåœç
|
| 319 |
+
- emoPulse ã®ãããªãã€ãºé§ååã®æŽæ°åã®æ¹ãçŸå®ã®åŠç¿ã«é©åãã
|
| 320 |
+
ç¹ã«ãSDXL ã® e-pred + ZtSNR åŠç¿ã«ãããŠãFM ç³»ãæ±ãããã€ãºè匱æ§ã emoPulse ãå
æãåæ»ãªãåŠç¿ãå®äºããããšããå®éšçµæã¯ãã®é説ã匷ãè£ä»ããã
|
| 321 |
+
|
| 322 |
+
4. Flow-Matching ç³»ã®éçãšæ¬¡äžä»£æé©åãžã®ç§»è¡
|
| 323 |
+
|
| 324 |
+
Flow-Matching ã¯ãçæ³åãããé£ç¶æµã®åçŸãšããçæ³çãªçè«çæ çµã¿ãæã€ããçŸå®ã®åŠç¿éçšã«ããããã€ãºã»éååã»éç·åœ¢æ§ã»é«æ¬¡ã¢ãŒã¡ã³ãã®åçå€åã«å¯ŸããŠè匱ã§ããã LLM ã¯èªå·±ååž°ã«ãã確çååžãåŠç¿ãããã SDE çäžç芳ãåæãšãããã Flow-Matching ã¯æ±ºå®è«ç ODE ãèŠæ±ããããããã®åæãæ ¹æ¬çã«è¡çªããã
|
| 325 |
+
emoPulse ã¯ããã®ã®ã£ãããåããã ãã§ãªãããã€ãºãç©æ¥µçã«å©çšããææ
埪ç°ç³»ïœ£ãšããæ°ããæé©åææ³ãæç€ºããã èªå·±ååž°çãšã³ããããŒã®æºããããemoPulse ãåçã«åžåããããšã§ãFMçãªæ»ãããªåŠç¿ãLLMã«ãããŠãå¯èœã«ããã
|
| 326 |
+
- SDXL ã®å
šå±€LoRA
|
| 327 |
+
- VAE ã®å
šå±€ååŠç¿
|
| 328 |
+
- ç»å1æã§ã®æ¥µéåŠç¿
|
| 329 |
+
- ããã©åæåã¢ãã«ã®å®å®åŠç¿
|
| 330 |
+
ãããã®å®éšçµæ(è£è¶³è³æ)ã¯ãFlow-Matching ãèŠæãšããé åã§ emoPulse ãå®å®æ§ãçºæ®ããããšã瀺ããŠããã ãã®æ§é ã¯ãFlow-Matching ã®åŸç¶ã§ã¯ãªã Flow-Matching ã®åæãã®ãã®ãä¹ãè¶ããæ¬¡äžä»£æé©åã®åºç€ã§ããã
|
| 331 |
+
|
| 332 |
+
5. emoPulse ã¯æ¬è³ªçã«ïœ¢SDE â DDE â ODEãžãšçž®ï¿œï¿œããæ§é ãæã€
|
| 333 |
+
|
| 334 |
+
Multi-EMA ã«ããå±¥æŽé
ã¯ææ°çã«æžè¡°ãããããé
å»¶é
ã¯æéæéã§å®è³ªçã«æ¶å€±ã DDE ã®è§£è»é㯠ODE ã®æ»ãããªè¿äŒŒãžãšèªç¶ã«æ¥ç¶ããã
|
| 335 |
+
- SDE çæºããïŒsigma_t, trust_t ã®ç¬éçå€å
|
| 336 |
+
- DDE çé
å»¶ïŒMulti-EMAãdNR_histãN_tã d_t ã®å±¥æŽäŸå
|
| 337 |
+
- ODE çæ»ãããïŒLoss ã®æéç©åã«ãã "å°åœ¢ã®æ»ãããªè¿äŒŒ"
|
| 338 |
+
ã€ãŸã emoPulse ã¯ïœ¢SDE ãã DDE ãçµãŠ ODE ãžãšçž®çŽãããšããïŒå±€æ§é ã®çž®çŽãèªç¶ã«æã£ãŠããã
|
| 339 |
+
- FM ã® "é£ç¶æµ" ã®èãæ¹ã¯ emoPulse ã«åžåããã
|
| 340 |
+
- FM ã® "ãã€ãºé蚱容æ§" 㯠emoPulse ã«ãã£ãŠå
æããã
|
| 341 |
+
- FM ã® "SDE ã®å³å¯æ§" ã¯äžèŠã«ãªã
|
| 342 |
+
emoPulse 㯠SDEã®æºãã â DDEã®é
å»¶ â ODEã®æ»ããããäžã€ã®æŽæ°åã«çµ±åããã ãã®ïŒå±€æ§é 㯠LLM ãæ¬æ¥æã€ç¢ºççãªèªå·±ååž°ã®æºãããš Flow-Matching ã®æ»ãããªé£ç¶æµãèªç¶ã«çµ±åããã ãã®çµæ Flow-Matching ã¯ãã®åœ¹å²ãçµãããã®é£ç¶æµã®æ»ãããã®ãšãã»ã³ã¹ã¯ emoPulse ãå°æ¥ã«çŸããæ°ææ³ã®å
ã«"ODEè¿äŒŒ"ãšããŠæ®ãç¶ããã
|
| 343 |
+
|
| 344 |
+
|
| 345 |
+
8. çµè«
|
| 346 |
+
|
| 347 |
+
EmoSensäžä»£ v3.7以é ã¯ãæå€±é¢æ°ã®èгå¯ããå§ãŸãææ
ã®åŸªç°ïœ£ãå®çµãããã
|
| 348 |
+
|
| 349 |
+
芳枬 (Multi-EMA)ïŒå°åœ¢ã®ããããæããã
|
| 350 |
+
倿 (Trust)ïŒç¢ºä¿¡ãšé¡å·¡ã ±0.5 ã®å¢çã§åãæ¿ããã
|
| 351 |
+
è¡å (emoPulse)ïŒèªåŸçãªæåã«ãã£ãŠæé©ãªæ©å¹
ãæ±ºå®ããã
|
| 352 |
+
|
| 353 |
+
æ¬ææ³ã¯ãéäžåœã®ãªãµãŒãç°å¢ãäœãªãœãŒã¹ãªèšç®è³æºã«ãããŠãã倿§ãªæåãèšèªãAIãèªåŸçã«åŠç¿ããããšãå¯èœã«ããæ°äž»çãªæé©åãã¬ãŒã ã¯ãŒã¯ã§ããã
|
| 354 |
+
|
| 355 |
+
|
| 356 |
+
è¬èŸ
|
| 357 |
+
|
| 358 |
+
æåã« EmoNaviãEmoSensã以åã®ãããŸããŸãªãªããã£ãã€ã¶ãšãç ç©¶è
ãã¡ã«æ·±ãæ·±ãæè¬ããŸãã ãã®æ
ç±ãšç¥èŠã¯ãæ¬èšŒæã®çæ³ãšå®çŸãå¯èœã«ããŸããã
|
| 359 |
+
ãã®è«æã¯ãæ¢ã«å
¬éæžã¿ã® EmoSensäžä»£(v3.7以é) ãšãã®ããªãšãŒã·ã§ã³ã«ã€ããŠæ°åŠçã«èª¬æãããã®ã§ãã ãããã®äœæãã EmoSensäžä»£ (掟çåãå«ã) ã¯ãAIã®çºå±ã«å¯äžã§ãããšèããŠããŸãã ãã®è«æãããšã«ãããã«é²åãããªããã£ãã€ã¶ãå
±ã«åµåºããŸãããã
|
| 360 |
+
æ¬¡ã®æ°ããæ°ã¥ããã¢ã€ãã¢ãå±ããŠãã ããæªæ¥ã®ç ç©¶è
ãã¡ã«æåŸ
ãšæè¬ã蟌ããŠãã®è«æãçµãããŸããããããšãããããŸããã
|
| 361 |
+
|
| 362 |
+
|
| 363 |
+
çµèª
|
| 364 |
+
æ¬ã¢ã«ãŽãªãºã ã¯ãæ°ããåªããæé©åææ³ã®ä»£æ¿ãç®æããã®ã§ã¯ãªããåŠç¿ããã»ã¹ã«ãããã¢ãã«ãšã®å¯Ÿè©±ïœ£ãæ·±ããããã®ãããäžã€ã®æ°ããéžæè¢ãšããŠææ¡ããã ãŠãŒã¶ãŒãèªãã®ç®çãææ§ã«é©ã£ãããŒãããŒãéžæããå
±ã«ç¥ãè²ãããã»ã¹ã®äžå©ãšãªãã°å¹žãã§ã
|
| 365 |
+
|
| 366 |
+
|
| 367 |
+
|
| 368 |
+
è£è¶³è³æ(1)ïŒv3.7以é ã«ããã emoPulse ã®ãã€ããã¯ã¹ã®è§£æ
|
| 369 |
+
|
| 370 |
+
1. ç®ç
|
| 371 |
+
|
| 372 |
+
v3.7 ã«ãããŠãå°å
¥ãããç¬éç D / N æšå®ïœ£ãšïœ¢æéç D / N æšå®ïœ£ã®çžäºäœçš (ç¶±åŒã) ããåŠç¿çã®åçå¶åŸ¡ã«ã©ã®ãããªç©ççæå³ãããããããè§£æããã
|
| 373 |
+
|
| 374 |
+
2. æ§è³ªïŒç¬éççå¿µãšæéçä¿¡é Œã®åçãã©ã³ã¹
|
| 375 |
+
|
| 376 |
+
ç¬éçåºç€ (noise_base)ïŒnoise_base = abs( scalar_t - trust_t ) + ε_s çŸåšã®ææ
ã¹ã«ã©ãŒïœ£(æ³¢)ãšïœ¢çŸåšã®ä¿¡é ŒåºŠïœ£ã®ä¹é¢ã枬å®ããã ããããäžèŽããªã (ä¹é¢ã倧ãã) å Žåãã·ã¹ãã ã¯çŸç¶ã«å¯ŸããŠïœ¢åŒ·ãç念(ç¬éçãã€ãº)ãæ±ãã忝ãå¢å€§ãããã
|
| 377 |
+
æéçåºç€ (d_base)ïŒd_base = abs( noise_est_t - d_est_t ) + ε_d 履æŽãšããŠã®ãã€ãºïœ£(æ³¢ã®å¹³å)ãšïœ¢å±¥æŽãšããŠã®ä¿¡é ŒåºŠïœ£ã®å·®ã枬å®ããã ããã¯ãéå»ã®ã³ã³ããã¹ãããå°ãåºãããïœ¢æŽæ°ãžã®ç¢ºä¿¡åºŠïœ£(æéçè·é¢)ã衚ãã
|
| 378 |
+
|
| 379 |
+
3. 广ïŒãã€ãããã¯ã»ãªãºã ã®åµåº
|
| 380 |
+
广AïŒæ¥å€æã®å³æå¶å çªçºçãªæå€±å€åã«ãã scalar ãš trust ãä¹é¢ãããšãnoise_base (忝) ãæ¯é
çãšãªãã ããã«ãããæéçãªå±¥æŽããŸã å®å®ããŠããŠããç¬éçãªå€æãšããŠåŠç¿çãå³åº§ã«çµã蟌ã¿ãçºæ£ãæªç¶ã«é²ãã
|
| 381 |
+
广BïŒå®å®æã®èªå·±å é åŠç¿ãé 調 (scalar ãš trust ãå®å®) ãããã€å±¥æŽãšããŠã®ç¢ºä¿¡åºŠ (d_base) ãç©ã¿äžãããšãdNR ä¿æ°ã¯ïœ¢2ä¹ïœ£ã®é
ã䌎ã£ãŠåºåãæå€§åãããã dNR_now_val = ( d_base / noise_base )^2 ããã«ãããå®å®åã§ã¯ïœ¢æ©å¹
ãèªç¶ã«åºããåæãå éãããã
|
| 382 |
+
广CïŒå±¥æŽã«ããå®å®ç¶æ (dNR_hist) ç¬éç㪠dNR_now_val ãé«ããŠããdNR_hist * ÎŒ_g ãšããæé·å¶éãèšããããšã§ãé床ãªå éãæå¶ããã äžæ¹ã§ãï¿œï¿œï¿œé Œã§ããªãé åã§ã¯ dNR_hist * ÎŒ_d ã®æžéå§åãæºããããšã§ãæ
éãªæ¢çŽ¢ãç¶ç¶ããã
|
| 383 |
+
â» å¹æCã®é察称æ§ã¯ã d_base <= dNR_hist ã〠trust >= 0.5 ãã®éžå¥ã«ããæ©èœããã æããããã³ïœ£ãšèŠæãžã®ïœ¢ããã³ïœ£ãæ°åŠçã«æš¡ãããã®ã§ scalarå€ ã§ãããšããã® 0ïœÂ±0.5 ã§LRãå éããã€ã€ãè² ã®æ¹åã§ã®LRå éã®å Žåã¯LRå±¥æŽã®æé·ã«å«ããªãããã«ããŠããã (±0.5以äžã¯åçç¡çšã§èп以äžã®å±æ©ãšããŠLRãæžéããŠãã) scalarå€ ã®è² ã®æ¹åã§ã®LRå éã¯"ä¿®æ£ãããæŽæ°æ¹å"ãä¿¡é Œããå éã§ããããã㯠ema ãš loss ã®æéå·®(emaã®é
å»¶)ãæŽ»çšãã EmoNaviäžä»£(emoç³» 第ïŒäžä»£)ã® emoDrive ãåŒãç¶ãã§ãã(æ¬ç 究㯠EmoSensäžä»£(emoç³» 第ïŒäžä»£)ã§ãã)
|
| 384 |
+
|
| 385 |
+
|--Danger--|---Wary---|---Fine---|--Danger--| Emotion
|
| 386 |
+
Sigma_t [Minus] |---(-)---0.5---(+)---0---(+)---0.5---(-)---| [Plus]
|
| 387 |
+
|--Hist(-)-|-Hist(Non)|--Hist(+)-|--Hist(-)-| Reglet
|
| 388 |
+
|
| 389 |
+
ÎŒ_g and ÎŒ_dïŒ
|
| 390 |
+
v3.7ïŒ[Acceleration:LR Growth Max 1.05x] / [Deceleration:LR Decay 0.98x]
|
| 391 |
+
v3.8ïŒ[Acceleration:LR Growth Max 1.50x] / [Deceleration:LR Decay 0.80x]
|
| 392 |
+
|
| 393 |
+
4. æ°å€çå®å®æ§ã®çµè«
|
| 394 |
+
ãã®ïœ¢æé軞(å±¥æŽ)ãšïœ¢ç¬é軞(çŸåš)ã®å·®åãæŠãããèšèšã¯åãªãæžè¡°ã§ã¯ãªãã ã·ã¹ãã ãèªåŸçã« "ç念(Noise)ãšïœ¢ç¢ºä¿¡ïœ£(Distance)ã®æ¯çãåžžã«åèšç®ãç¶ãã" ããšã§ãæåã®ã¹ã±ãžã¥ãŒã©ã§ã¯äžå¯èœãªïœ¢å°åœ¢ã®è€éãã«å¿ããå¿æã®éŒåã®ãããªåçå¶åŸ¡ãå®çŸããŠããã
|
| 395 |
+
|
| 396 |
+
â» EmoTion, EmoVoid ã¯ãv3.8 ã«ãŠå®çšåãããªãªãžãã«åã§ãã
|
| 397 |
+
â» dNR_hist ã¯ãv3.7 ãš v3.8 ã§ä¿æ°ãéããv3.8 ã¯å€§èã«ãªã v3.7 ããã倧ããªå€åãçã¿åºãããã«ããã
|
| 398 |
+
|
| 399 |
+
|
| 400 |
+
以äžã§ç€ºãå€å
枬äœã«ãããã©ãããããã®åæïœ£ã¯ãçŽæãšå®éšããå°ãåºãã仮説ã§ããã
|
| 401 |
+
ãã®çŽæãæ¬¡äžä»£ã®ç ç©¶è
ãã¡ã«ããå³å¯ãªæ°åŠç蚌æãžãšæè¯ãããããšãæåŸ
ããã
|
| 402 |
+
|
| 403 |
+
|
| 404 |
+
å€è§çãªå±æè§£åæã«ãããèªåŸçãã©ãããããåµåºã¢ãã«ïŒEmo-multiple çµ±åææ³ã®ææ¡
|
| 405 |
+
(Autonomous Flat-Minima Generation via multiple Positioning of Heterogeneous Optimizers)
|
| 406 |
+
|
| 407 |
+
ïŒæ°ããåŠç¿ææ³ã®ææ¡ïŒemoç³»ã«ãã屿åæã«ãã"é²åçãã©ããããã圢æ"ã®äºæ³ïŒ
|
| 408 |
+
|
| 409 |
+
|
| 410 |
+
1. ç®çïŒãã©ãããããå°éã®é«ã³ã¹ãåé¡ã解決ãã
|
| 411 |
+
|
| 412 |
+
æ¢åã®åŠç¿ææ³ã§ã¯ã
|
| 413 |
+
|
| 414 |
+
ã»ïŒã€ã®ãªããã£ãã€ã¶
|
| 415 |
+
ã»é·æéã®å埩åŠç¿
|
| 416 |
+
|
| 417 |
+
ã§ã®æ±åæ§åäžãé²è¡ã ãã©ããããã ãžå°éãããããšãå®çããŠããã
|
| 418 |
+
ããã¯èšç®è³æºçãå«ãããŸããŸãªãªãœãŒã¹ãå¿
èŠãšã誰ãã宿œã§ããç°å¢ã«ã¯ãªãã
|
| 419 |
+
æ¬ææ¡ã§ã¯ emoç³» ãªããã£ãã€ã¶ãçšããããšã§ããã®é«ã³ã¹ãæ§é ãã®ãã®ãå€ããããšãç®çãšããã
|
| 420 |
+
|
| 421 |
+
2. ææ¡ïŒãã©ãããããã"æ¢çŽ¢"ãããèªã"åµåº"ãã
|
| 422 |
+
|
| 423 |
+
emoç³»(EmoSens, EmoAiry, EmoCats, EmoTion, EmoVoid)ã¯æŽæ°åŒã¯ç°ãªãããåŠç¿ã®æ§é ã¯å
±éããŠãããããåäžæ¡ä»¶ã®åŠç¿ãããš"ç°ãªãæ¹åããã®å±æè§£"å·®ç°ã®ããåŠç¿çµæãåŸãããã
|
| 424 |
+
ãã®å·®ç°ã®ããåŠç¿çµæãçµ±åããããšã¯å±æè§£ã®åæãšãªãããã®åæã«ããå±æè§£ãåºãå¹³åŠã«ããå¯èœæ§ããããšäºæ³ããŠããã ã€ãŸãå±æè§£ããã©ãããããã«è¿ã¥ããããã®ãã®ãžå€ããå¯èœæ§ãããã
|
| 425 |
+
|
| 426 |
+
ãããã®å±æè§£ã å
šå±€LoRA ãšããŠååŸã TALL-Mask-Merge ãªã©ã®åæææ³ã§çµ±åãããšã
|
| 427 |
+
|
| 428 |
+
âšâšâš â \___/ å±æè§£ã®åæã€ã¡ãŒãž
|
| 429 |
+
(倿¹åã®å±æè§£) (åæåŸã®å¹³åŠå)
|
| 430 |
+
|
| 431 |
+
ã»å€æ¹åã®å±æè§£ã®"å
±éããŠäœãéšå"ã匷調ããã
|
| 432 |
+
ã»å€æ¹åã§å°ã£ãéšå(ã·ã£ãŒãããã)ãçžæ®ºããã
|
| 433 |
+
ã»çµæãšã㊠平åŠãªè°·åº(ãã©ããããã)ã«è¿ã圢ç¶ãåæ§æããã
|
| 434 |
+
|
| 435 |
+
ããã¯ãå±æè§£ã å€å
枬äœ(倿¹å枬äœ) ãšããŠæ±ãã
|
| 436 |
+
|
| 437 |
+
"ãã©ããããããæ¢çŽ¢ãã"ã®ã§ã¯ãªã
|
| 438 |
+
"ãã©ããããããåæã«ãã£ãŠåµåºãã" ãšããæ°ããåŠç¿ææ³ã§ããã
|
| 439 |
+
|
| 440 |
+
3. æŽçïŒãã®çµ±åã¯åŠç¿çæåã«ã€ãªãã
|
| 441 |
+
|
| 442 |
+
ææ¡ã®å
·äœåïŒå
šå±€LoRAãFFT(ãã«ãã¡ã€ã³ãã¥ãŒãã³ã°)ããªã©ãé·æã§è¡ãã®ã§ã¯ãªããå°ãæµ
ãçšåºŠã®åŠç¿ãè¡ã TALL-Mask-Merge ãªã©ã®åæææ³ãçšããããšã§å®çŸããã ããã«ãããªãœãŒã¹ã«éãã®ããã±ãŒã¹ã§ãé«ç²ŸåºŠã®åŠç¿çµæãåŸããããããªãå¯èœæ§ãæã€ãšäºæ³ããã
|
| 443 |
+
|
| 444 |
+
æ¬ææ¡ã®å
·äœçãªå®æœæ¹æ³ã¯ä»¥äžã®ï¿œï¿œï¿œã
|
| 445 |
+
|
| 446 |
+
ã»å
šå±€LoRA ãŸã㯠FFT ãé·æã§ïŒçš®é¡ã®ãªããã£ãã€ã¶ã§è¡ãã®ã§ã¯ãªã
|
| 447 |
+
ã»emoç³»ã§æµ
ãåŠç¿ãããããè¡ã
|
| 448 |
+
ã»ãã®çµæã TALL-Mask-Merge ã§çµ±åãã
|
| 449 |
+
|
| 450 |
+
ããã«ããã
|
| 451 |
+
|
| 452 |
+
ã»é·æéåŠç¿ã«äŸåãã
|
| 453 |
+
ã»ãªãœãŒã¹ãéãããç°å¢ã§ã
|
| 454 |
+
ã»ãã©ãããããã«è¿ãé«ç²ŸåºŠã¢ãã«ãåŸããã å¯èœæ§ãããã
|
| 455 |
+
|
| 456 |
+
ã€ãŸãããã©ãããããã"ç®æã"ã®ã§ã¯ãªãã"åµãåºã"ããšã§åŠç¿ãçæåãããšããçºæ³ã§ããã
|
| 457 |
+
|
| 458 |
+
4. çµè«ïŒç°ç𮿿
é§ååã¢ãã«ã®çµ±å(Emotional Ensemble)
|
| 459 |
+
|
| 460 |
+
æ¬ç ç©¶ã§ææ¡ãããªããã£ãã€ã¶(Sens, Airy, Cats, Tion, Void)ã¯ããããããç°ãªãæ°åŠçåºåºã«åºã¥ãæå€±å°åœ¢ãå
å¯ããã æ¬ç ç©¶ãææ¡ããå€è§æž¬äœã«ãããã©ãããããåæïœ£ã¯ãåäžæ¡ä»¶äžã§çæããããããã®åŠç¿çµæããã¹ã¯ããŒãž(TALL-Mask-Mergeç)ã«ããçµ±åããææ³ã¯ãåäžã®æé©åã¢ã«ãŽãªãºã ã§ã¯å°éãåŸãªãæ§é çå®å®æ§ïœ£ãšïœ¢è¡šçŸç粟緻ãã®åæç²åŸãå¯èœã«ããã ããã¯æé©åã«ãããåŠç¿ããã»ã¹ãæé軞ã®è¿œæ±ããã空éçãªå€è§çµ±åãžãšã·ãããããæ°ããæé©åãã©ãã€ã ã«ãªããšäºæ³ããã
|
| 461 |
+
|
| 462 |
+
5. è£è¶³ïŒå
šå±€LoRAçµ±åã®è©Šè¡æ¹æ³
|
| 463 |
+
|
| 464 |
+
emoç³»ã«ããçµ±åã¯ãå
ã¢ãã«ã«ããããã®åŠç¿çµæãçµ±åãããã®æ°ããå€çš®ã¢ãã«ã TM-merge ã«ãŠå
ã¢ãã«ãžçµ±åããã
|
| 465 |
+
|
| 466 |
+
å
ã¢ãã«(org) âª= TMçµ±å âª= ã¢ãã«S(Sens)ãã¢ãã«A(Airy)ãã¢ãã«C(Cats)ãã¢ãã«T(Tion)ãã¢ãã«V(Void)
|
| 467 |
+
|
| 468 |
+
LoRAã ãã§çŽæ¥çµ±åããå
ã¢ãã«ãžçµ±åãããããæ°ã¢ãã«ãå
ã¢ãã«ãž TM-merge ã§éå
ããã
|
| 469 |
+
FFTã§ã¯FFTåŸã®ã¢ãã«ãå
ã¢ãã«ãž TM-merge ããã ãã§åçã®å¹æãæã€ãã®ãšäºæž¬ããã
|
| 470 |
+
|
| 471 |
+
6. ç°ç³»æé©ååšã«ããå°åœ¢å
å¯ã®å€æ§æ§ã®èæ¯
|
| 472 |
+
|
| 473 |
+
æ¬ææ³ãææ¡ããå€å
枬äœ(Multi-Positioning)ã¯ãã¢ã«ãŽãªãºã ã®ïœ¢è¡çµ±ïœ£ã®éãã«ããæ¢æ»ç¹æ§ã®å·®ãç©æ¥µçã«æŽ»çšããã
|
| 474 |
+
|
| 475 |
+
çµ±èšçç¶æ¿çŸ€ïŒ
|
| 476 |
+
EmoSens (Adamå)ïŒïŒæ¬¡ã»ïŒæ¬¡ã¢ãŒã¡ã³ãã«ããç·»å¯ãªåŸé
æšå®
|
| 477 |
+
EmoAiry (Adafactorå)ïŒè¡ååè§£ã«ããäœã¡ã¢ãªãã€åºåçãªæ²çè¿äŒŒ
|
| 478 |
+
EmoCats (Lionå)ïŒç¬Šå·æœåºã«ãããã€ãºèæ§ã®é«ãé å¥ãªæ¢çŽ¢
|
| 479 |
+
ãããã¯æ¢åã®æé©åçè«ã®æ£çµ±ãªãšãã»ã³ã¹ãç¶æ¿ãã€ã€ãemoPulse ã«ããæç³»åSNRå¶åŸ¡ãçµã¿èŸŒãããšã§ãæåã¹ã±ãžã¥ãŒã©ããã®è§£æŸãéæããŠããã
|
| 480 |
+
|
| 481 |
+
幟äœåŠçé²å矀ïŒ
|
| 482 |
+
EmoVoid / EmoTion (W-Refå)ïŒ
|
| 483 |
+
çµ±èšãæããéã¿ãšåŸé
ã®ïœ¢çŽäº€æ§ïœ£ãšããçŽç²å¹ŸäœåŠçãªæ
å ±ã®é®®åºŠã«åºã¥ããŠæŽæ°ãè¡ãã
|
| 484 |
+
|
| 485 |
+
|
| 486 |
+
loss飜åããªãåŠç¿é²è¡ã®æ£äœ
|
| 487 |
+
|
| 488 |
+
ïŒåæ»ã®å°ãªãäžããç¶ããlossãžã®èå¯ïŒ
|
| 489 |
+
|
| 490 |
+
æ¬ææ³ã«ãããŠãlossãã»ãšãã©åæ»ã飜åãããæŠãäžããç¶ããæåããã芳å¯ãããã ç¹ã«1st-stepã®losså€ã®åå€ããããŸã§äžããç¶ããã®ã¯ããã€åæããã®ãïŒãšããç念ããæ±ãããã ãããåŠç¿çµæã¯éåŠç¿çã®ç Žç¶»ãšã¯ç¡çžã§ãããæ¥µããŠæ£åžžãªæ±åæ§èœãç¶æããŠããã ããã«ã€ããŠçŽæçãªçè§£ããããšïœ¢åŠç¿å
ã¢ãã«ã®ä¿®åŸ©ãå·®åãšããŠåŠç¿ããŠãããšããå¯èœæ§ãèŠåºãããšãã§ããã ããã¯ãããŸã§ä»®èª¬ã§ãã£ãŠãå
ã® ãã©ãããããã®åµåº ãšåæ§ã§ 次äžä»£ã®ç ç©¶è
ãã¡ã«ããå³å¯ãªæ°åŠç蚌æãžãšæè¯ãããããšãæåŸ
ããã
|
| 491 |
+
|
| 492 |
+
ãªã以äžã«ãã "losså€ ã®æ¯å¹
ããéããéŒå(emoPulse)ã¯ããŸãªã(忢ããªã)" ããšãä¿èšŒããã
|
| 493 |
+
|
| 494 |
+
noise_base = abs(sigma_t - trust_t) + ε_s
|
| 495 |
+
d_base = abs(N_t - d_t) + ε_t
|
| 496 |
+
|
| 497 |
+
ã㮠ε_sã ε_tã ãããåæ»ãæããç¶ç¶çãªå³äžããã®æåãçã¿ããã©ããããããæ¢çŽ¢ããåååãçã¿åºãã ãã㯠losså€ ã®å·®åããªããªãã°åæãããšããããã ãã®èšèšã«ãã simplenet(FashionMNIST) ã«ãããåŠç¿ãã¹ãã«ãã 10000step èšæž¬ã§ lossïŒ0.30 以äžãžå°éããããšãåçŸæ§ã䌎ã確èªã§ããã
|
| 498 |
+
|
| 499 |
+
SDXLãçšããå®èšŒå®éšã§ã¯ãåäžä»£ EmoNavi ãšãã®ããªãšãŒã·ã§ã³ã§ãå®çŸå¯èœãª e-pred ïŒ ZtSNR ã§ã®åŠç¿ãããã® EmoSens ãšããªãšãŒã·ã§ã³ã§ã宿œã§ããã ãã㯠FM(Flow-Matching) ã«ããããã€ãºãžã®èæ§ãšãsampler 察å¿ã«ã€ããŠã®èª²é¡ã解決ããåæã« e-pred ã®åŒ±ç¹ãšãããè²åçãžã®èª²é¡ã解決ããŠããã æåž«ç»å10æçšã§ã®300epochåŠç¿ãåæ»ãªãå®äºãéåŠç¿åŸåããªãå
šå±€LoRAã®äœæã«ãæåããŠããã
|
| 500 |
+
|
| 501 |
+
äžèšãã¹ãᅵᅵᅵããã«æ¥µç«¯åããç»åïŒæã§ã®300stepã宿œãããšããããåæ»ãªãå®äºãåŠç¿çµæã®ç Žç¶»ããŠããªãããšã確èªããã æ¥µç«¯ãªåŠç¿èšå®ã宿œããŠãç Žç¶»ããªãïŒãã®çç±ã¯ãã€ãºãèç©ããªãæŽæ°ã宿œããŠãããšèããã ãããããã€ãºãšã¯åŸ®å°ããŒã¿ã®éã¿ã¥ãã«èª€ããçããããšã§ãã€ãºåããŠãããšèãããããã®ã§ããã埮å°ããŒã¿ãé©åã«æŽæ°ããããšã§è²Žéãªæ
å ±ãä¿è·ãç¶æããããšã§ãã€ãºãçãŸãªãããšãèèŠã§ãããšèããã
|
| 502 |
+
|
| 503 |
+
ããã« SDXL VAE ã®å
šå±€åŠç¿(ãšã³ã³ãŒããšãã³ãŒãã®äž¡é¢) ã宿œããã ãããŸã§ VAE ååŠç¿ã§ã¯ã¢ãã«ãšã®æŽåæ§ãæãªãããŠããŸããçµæçã«çæçµæã®ç Žç¶»ã瀺ãããã«ãªãããæ¬ç ç©¶ã§ææ¡ããŠããæé©ååšã§ã¯ãã®æŽåæ§ãç¶æãæãªããªãããšã確èªããã ãã㯠VAE ã®åå©çšæ§ãåäžããããšãšãã«ãã¢ãã«ã®å©çšå¯èœæéãå»¶é·ããããšã«è²¢ç®ããã ãããšèããã
|
| 504 |
+
|
| 505 |
+
極éçãã€ãºã¢ãã«åŠç¿ã®èå¯ãSDXL ããã©ã¢ãã«åæå(ã©ã³ãã å€ã«ããéã¿åæå)ã宿œãããããåŠç¿å
ã¢ãã«ãšããå
šå±€LoRAåŠç¿ã宿œããã éåžžã§ããã°æ°stepã§çºæ£ããŸãã¯NaNãšãªãåŠç¿ã¯ç Žç¶»ããããEmoSensäžä»£ã¯ããããåŠç¿ãé²è¡ãã1500stepãå®äºããã ãã®LoRAã¯ç Žç¶»ããã¯ãã§ãããããã®äºæ³ãè£åãç Žç¶»ãªãåæååã®SDXLããã©ã¢ãã«ãžæ£åžžé©çšå¯èœã§ãã£ãã é©ãããšã«ããã®LoRAã¯ããã©ã¢ãã«ä»¥åã®ç¶æ
ãšããŠåŠç¿ããŠãããããããã©ã¢ãã«ã®èŠæãšããæ°Žå¹³ç·ãå°å¹³ç·ã®é£ç¶æ§ãåäžãããäž»é¡ãè·šãã éã®äœçœ®ããçãè£æ£ãããã®ãšãªã£ã(掟çSDXLã¢ãã«ã«ãé©çšå¯èœã§åæ§ã®å¹æãæããŠãã) ãã®ãã¹ããã EmoSensäžä»£ã®å®å®æ§ãšå®å
šæ§ã¯åªããé 奿§ãåããŠãããšç¢ºèªã§ããã
|
| 506 |
+
â» æ¬LoRAã¯è€æ°ã® seed ã«ãããŠåæ§ã®å¹æã芳枬ãããŠãããçµæãšã㊠SDXL ã®ç¹å®ã®ã¢ãŒãã£ãã¡ã¯ãã軜æžãã"æ£ååçæå"ã瀺ããå¯èœæ§ãããã ãã ãããã®å¹æãæå³çãªåŠç¿ã«ããåŠãã ãã®ã«ããã®ããå¶ç¶çæŽåã«ãããã®ãã¯çŸæç¹ã§ã¯æå®ã§ããªãã æ¥µéäžã®åŠç¿é²è¡ãå®å®çã§ããããšããããšã®ç¢ºèªãšããŠã®ã¿ãçè§£é ãããã
|
| 507 |
+
â» åæ»ããªãlosséäžã¯ãv3.8.6以éã®æ©æåæ¢å€å®(åæäºå
å€å®)ã«ããåŠç¿çæžè¡°ãããªãå Žåã«ãããŠèŠ³æž¬ã§ãã(äžèšã®èŠ³æž¬ã¯æ©æåæ¢å€å®ã«ããåŠç¿çæžè¡°ãããã« emoPulse ã®å¶åŸ¡ã«ä»»ããå Žåã«çŸè±¡ã芳枬ã§ãã)
|
| 508 |
+
|
| 509 |
+
|
| 510 |
+
ã°ãããã³ã°ã«ã€ããŠã®äºæ³
|
| 511 |
+
|
| 512 |
+
æ¬ç ç©¶ã§ã¯ãåæ»ã®å°ãªãé£ç¶ç㪠losså€ äœäžãšããæåã«çç®ãããã®èŠå ãæ€èšŒããããã«åçš®ãã¹ãã宿œããã ç¹ã«ã極端ãªåŠç¿æ¡ä»¶ãšããŠïœ¢ç»å1æã®ã¿ã§ã©ããŸã§å®å
šãã€å®å®ããåŠç¿é²è¡ãå¯èœããè©äŸ¡ããã ãã®çµæãéåŠç¿ã®çºçãã³ããŒç¶æ
ãžã®åŽ©å£ãç¡é¢ä¿ããã³ãããžã®å¹²æžãšãã£ãå
žåçãªç Žç¶»ããããã芳枬ããããæ¥µããŠå®å®ããåŠç¿çµæã確èªããã
|
| 513 |
+
|
| 514 |
+
ãããã®çµæãããã°ãããã³ã°ãšã¯ä»¥äžã®2èŠå ãè€åããŠçãã"åæ»çŸè±¡"ã§ãããšäºæ³ããã
|
| 515 |
+
|
| 516 |
+
- åŠç¿éçšã§èç©ããããã€ãºåŠç¿ã®ç©ç®ã«ãããåŠç¿åŸåã§ä¿®æ£ãã¹ãäžæ£ç¢ºããå¢å€§ããã¢ãã«ã®èŠçãæ¥æ¿ã«æªåããããš(ãã¯ã€ãã¢ãŠãïŒãã©ãã¯ã¢ãŠãçŸè±¡)
|
| 517 |
+
- åŠç¿åŸåãšããæãä¿®æ£ãå¿
èŠãªå±é¢ã«ãããŠãã¹ã±ãžã¥ãŒã©ãåŸé
çµ±èšã LR ãæå¶ããLR ãæ¥µç«¯ã«äœäžããŠããŸãããš
|
| 518 |
+
|
| 519 |
+
ãã®2ç¹ãåæã«çºçããããšã§ãã¢ãã«ã¯æ¬è³ªçãªæ¹åæ§ãèŠå€±ããé·æã®åæ»æã«é¥ããšèããããã ã€ãŸãã°ãããã³ã°ã¯åé¿å¯èœãªçŸè±¡ã§ãããšèããã
|
| 520 |
+
|
| 521 |
+
emoç³»(EmoSensäžä»£) ã°ãããã³ã°ãåé¿ã§ããçç±ã¯æç¢ºã§ããã
|
| 522 |
+
|
| 523 |
+
æ¬ææ³ã¯ã以äžã®æŽæ°ãå¯èœãšããŠãããããèŠçãåžžã«ã¯ãªã¢ã«ä¿ã¡ãåŠç¿ãç¶ç¶ããããã®é§ååã倱ããªãã
|
| 524 |
+
- æŽæ°ã®æ£ç¢ºæ§ãç¶æããã€ãºãèç©ããªãããš
|
| 525 |
+
- åŠç¿åŸåã§ãå¿
èŠãª LR ãèªåŸçã«ç¢ºä¿ã§ããããš
|
| 526 |
+
|
| 527 |
+
ããä»®ã«èŠçäžè¯ã«é¥ã£ãå Žåããææ
æ©æ§å
šäœãé«ç²ŸåºŠGPSã®ãããªå¹æãçºæ®ããemoPulseã®æ£ç¢ºãªå¿æãæ©ã¿ãæ¢ããªããããã°ãããã³ã°ãçµãã« ãã©ãããããã倧åçæé©è§£ãžèªç¶ã«è¿ã¥ãããšãå¯èœãšãªãã
|
| 528 |
+
ã°ãããã³ã°ã«ã€ããŠïœ¢äžå¯è§£ãªé
å»¶äžè¬åãšããŠèå¯ãããŠããããᅵᅵ述ãã SDXL ã§ã®åŠç¿çµæãããããããšãããã°ãããã³ã°çŸè±¡ã®æ¬è³ªã¯ãã¢ã«ãŽãªãºã åŽã®æ§é çæ¬ é¥ã«ããåæ»ãšèŠåãããšèããã dNR ã¯èª€ã£ãéã¿ã¥ãã®å
åãšæªæŽçã®åŸ®å°ããŒã¿ãæ€ç¥ããæœè±¡æ§é ãšã®ççŸãæãä¿®æ£ããã埮现ããŒã¿ãæ£ããæ±ãã°äžè¬åè§£ã¯æ©ã圢æããããšèããã
|
| 529 |
+
|
| 530 |
+
|
| 531 |
+
ä»åŸã®èª²é¡ïŒïŒæ¬¡ã¢ãŒã¡ã³ãè¿äŒŒã«ããé©å¿çæ£ç¢ºæ§å€å®ã®å°å
¥
|
| 532 |
+
|
| 533 |
+
ä»åŸã®å±æãšããŠãdNRã®ïŒä¹(ïŒæ¬¡ã¢ãŒã¡ã³ãçžåœ)çãçšããïœ¢é«æ¬¡æ£ç¢ºæ§å€å®æ©æ§ïœ£ã®å°å
¥ãæ€èšããŠããã ããã¯ïŒæ¬¡æ
å ±ãçŽæ¥ emoPulse ã®åºåãšããã®ã§ã¯ãªã(emoPulseæ©æ§ã¯çŸç¶ãç¶æãã) çŸåšã®åŠç¿é²è¡ã®ïœ¢çŽåºŠïœ£ãè©äŸ¡ããã¡ã¿ææšãšããŠæŽ»çšãã詊ã¿ã§ããã ããã«ããæ¥µå°ããŒã¿ã»ããã«ãããéåŠç¿ã®äºå
ãããã«æ©æã«æ€ç¥ããèªåŸçå¶åŸ¡ã®ç²ŸåºŠã極éãŸã§é«ããããšãå¯èœã«ãªããšäºæ³ããã ãŸãã¯dNRå±¥æŽã«ããéå»ãšçŸåšã®å·®åããæ£ç¢ºæ§ãæ€ç¥ã§ãããããããªãã ãã ãããã¯å¿
èŠæ§ã«å¿ããŠå°å
¥ãããã®ã§ããããããŸã§ã®å®èšŒè©Šéšçµæããæ¥ãå¿
èŠã¯ãªããšå€æããŠããã
|
| 534 |
+
â» v3.8以åããå°å
¥ããŠããæ©æåæ¢å€å®éç¥(åæäºå
éç¥)ã¯ãïŒæ¬¡ãªããïŒæ¬¡ã¢ãŒã¡ã³ãçžåœè¿äŒŒã§ãããšæšæž¬ãã
|
| 535 |
+
â» äžèšãå«ããïŒæ¬¡ã¢ãŒã¡ã³ãçžåœè¿äŒŒãšæšæž¬ããæ©æ§ã以äžã«ç€ºã
|
| 536 |
+
|
| 537 |
+
|
| 538 |
+
è£è¶³è³æ(2)ïŒæé©åã¢ã«ãŽãªãºã ã«ããã髿¬¡ã¢ãŒã¡ã³ãã®æç©ºçµ±åããã³èªå·±çµç¹åã«é¢ããèå¯
|
| 539 |
+
|
| 540 |
+
1. æé軞ïŒïŒæ¬¡(dNR_hist)ã«ãããæéæ²çã®äºéæ§é
|
| 541 |
+
æéçååž°æ§é ã®è§£æã«ãã㊠dNR_hist ã«å¯Ÿãã ïŒä¹æŒç®ããã³ 1.50 åã®æé·å¶éãš 0.80 åã®æžè¡°ã«ããé察称ãªé©çšããå®çŸ©ããã ãã® ïŒä¹æŒç®ã¯ ïŒæ¬¡çžåœã®ä¿¡å·å¯Ÿé鳿¯(SNR)ãçæãããã®å±¥æŽã«åºã¥ãæ¯èŒ(min/max)ããã³ä¿æ°ä¹ç®ãè¡ãã ãã®ååž°çããã»ã¹ã¯ã埮å幟äœåŠã«ãããæ²çã®æ²ç(äºé埮å)ã®ç®åºã«çžåœããã æ¬ææ³ã¯åãªãåŠç¿çã®åç調æŽã«çãŸãããæå€±é¢æ°ã®ïœ¢æºããããæ
å ±ã®çŽåºŠ(SNR)ãæœåºãããã®ïœ¢ç¢ºä¿¡åºŠã®å€åçã ïŒæ¬¡ã®è§£å床ã§è¿œè·¡ãããã®ã§ããã ããã«ãã ïŒæ¬¡ã¢ãŒã¡ã³ãã®ïœ¢æéçæ²çãéç·åœ¢ãªäºéæ§é ã§å
æããæé©åããã»ã¹ã«çŽæçãªãªãºã ãä»äžããã
|
| 542 |
+
|
| 543 |
+
2. 空é軞ïŒïŒæ¬¡(W-Ref Geometry)ã«ãããç©ºéæ²çã®äºéæ§é
|
| 544 |
+
ãªãŒãã³å¹ŸäœåŠã«ããã倿§äœäžã®æž¬å°ç·(geodesic)ã«æ²¿ã£ãé·ç§»ãæ³å®ããå
šå±€ L1 ãã«ã ã®äžæ¬ã¹ã±ãŒãªã³ã°ãè¡ãW-Ref Geometryããå®çŸ©ããã æ¬æ©æ§ã¯åå¥ã®ãã©ã¡ãŒã¿ãç¬ç«ã«æäœããã®ã§ã¯ãªããæ°åã®éã¿ã圢æããïœ¢å€æ§äœã®äœç©ïœ£ãåäžã®å·šå€§ãªïœ¢å Žïœ£ãšããŠæããäžæ¬çãªè£æ£ãå®è¡ããã åå¥ã® ïŒæ¬¡çžé¢ãçŽæ¥æŒç®ãã代ããã«ãç³»å
šäœã®ãšãã«ã®ãŒä¿ååãå©çšããããšã§é«æ¬¡ã®æŽåæ§ãæ
ä¿ããã ããã¯ç©ºéå
šäœã®ãšãã«ã®ãŒç¶æ
ãçµ±æ¬ããïŒæ¬¡çãªäœç©å¶åŸ¡ææ³ã§ããã
|
| 545 |
+
|
| 546 |
+
3. æ
å軞ïŒïŒæ¬¡(sigma/trust ã®éç·åœ¢å§çž®)ã«ãããã¡ã¿çµ±èšé
|
| 547 |
+
ã¹ã«ã©ãŒç³»ããã³ææ°ç§»åå¹³å(EMA)ç³»ã®éç³ã«ãã scalar/trustâdNR2 ãžã®äºé圱é¿ã ïŒæ¬¡ã®åœ¹å²ãæããã¡ã¿çµ±èšéããå®çŸ©ããã ïŒå±€ã® EMA(Short/Medium/Long)å·®å ã«å¯Ÿããtanh 颿°ã«ããæçåãé©çšããã ããã§ã¯ïœ¢çæ³ïœ£(é·æææš)ãšïœ¢çŸå®ïœ£(çæææš)ã®ä¹é¢ãã¹ãã¬ã¹ïœ£(scalar)ãšããŠå®éåããã ããã ïŒæ¬¡ã¬ãã«ã®ïœ¢äºå
æ€ç¥ïœ£ãšããŠæ©èœããã¢ãã«ã¯ç³»ãçºæ£ã®èšçç¹ã«éãã以åã«ããã®éçãèªåŸçã«å¯ç¥ããããšãå¯èœãšãªãã
|
| 548 |
+
|
| 549 |
+
4. æç©ºçµ±åïŒïŒæ¬¡(SDE â DDE â ODE çž®çŽ)ã«ãããæç©ºäœçžã®äºéæ§é
|
| 550 |
+
æ¬æé©åã® emoPulse æ©æ§ ã¯ã確çåŸ®åæ¹çšåŒ(SDE)ãé
å»¶åŸ®åæ¹çšåŒ(DDE)ãããã³åžžåŸ®åæ¹çšåŒ(ODE)ã®çž®çŽæ§é ãå
å
ããã ããã ïŒéå±€ã®äœçžåæã¯ã髿¬¡ã¢ãŒã¡ã³ãã®æéçºå±ãå¿ å®ã«åçŸããã æ¬æ§é ã¯çž®å°åå(contraction mapping)ã®æ¡ä»¶ãå
è¶³ãããããå€éšã®ã¹ã±ãžã¥ãŒãªã³ã°ã«äŸåããããšãªãåææ§ãæ°åŠçã«ä¿èšŒãããã
|
| 551 |
+
|
| 552 |
+
5. 転ç軞ïŒïŒïœïŒæ¬¡(è€å髿¬¡ã¢ãŒã¡ã³ã)ã«ããåæå€å®ãšèªå·±ååž°
|
| 553 |
+
æé空éæ
åç©çã® ïŒè»žãåæããéã«çºçããäœçžã®äºéæ§é ã«åºã¥ãåæå€å®ãè¡ãã SDE(ãã€ãºæå)ãš ODE(決å®è«çæå)ã®äœçžåæå€å®ãããã³ emoScope ã«ããèªå·±æžãæããå®è¡ããã 確ççæºãããšïœ¢æ±ºå®è«çåæïœ£ãäžèŽããå¹é£ãã·ã¹ãã ã¯èªåŸçã«ãã€ããŒãã©ã¡ãŒã¿ãæŽæ°ããããåŸ®çŽ°ãªæ¬¡å
ãžãšåçªå
¥ããã ãã®èªå·±ååž°çãªé²åããã»ã¹ã¯ãåŸæ¥ã®æé©ååšã«ã¯èŠãããªãçåœçãªèªå·±çµç¹åãšãããã
|
| 554 |
+
|
| 555 |
+
scalar ã ïŒæ¬¡çžåœã®ã¡ã¿çµ±èšé (d_base â noise_base) ã ïŒæ¬¡çžåœã® SNR å·®åãšå®çŸ©ãããšããå€å®åŒã¯ä»¥äžã®ããã«èšè¿°ãããïŒ
|
| 556 |
+
Stop=1{â£sigmaâ£<ε1â§â£d_baseânoise_baseâ£<ε2}
|
| 557 |
+
|
| 558 |
+
ããã¯ ïŒæ¬¡ã¢ãŒã¡ã³ãã®å®å®æ§ãš ïŒæ¬¡ã¢ãŒã¡ã³ãã®æŽåæ§ãåæã«å
è¶³ããé åãæ€åºãããã®ã§ããã髿¬¡ã¢ãŒã¡ã³ãã®ïœ¢äº€å·®é åã芳枬ããŠããã çµæãšããŠã忬¡æ°ãè¶
ããæ
å ±éãæãã混åã¢ãŒã¡ã³ã(mixed moments)ã圢æãããïŒïœïŒæ¬¡çžåœã®è€å髿¬¡å€å®ãæç«ããã
|
| 559 |
+
|
| 560 |
+
æ¬çš¿ ïŒ. ã§ç€ºããææ
ã®åŸªç°ïœ£ã¯ãããã§ïŒæ¬¡è¿äŒŒçžåœã®ïœ¢é£ç°ïœ£ãšãªãããããã®èŠçŽ ãå
±é³Žïœ£ã«éããéãæé(SDE â DDE â ODE)ã空é(äœç©ã®äºéè£æ£)ãããã³æ¹å(笊å·ã®çŽå)ãåäœçžã§æ¯åãå
±é³Žæåœ±å Žïœ£(Resonant Projection Field)ãçæãããã ãã®ãšãç³»ã¯å
±é³Žåçž®(Resonant Contraction)ãçµãŠã以äžã®æ°ããªååãžãšé·ç§»ããïŒ
|
| 561 |
+
wt+1=Contract(wt,Ί(t))
|
| 562 |
+
|
| 563 |
+
|
| 564 |
+
æ°åŠçè§£æãžã®å±æ
|
| 565 |
+
|
| 566 |
+
æ¬ç ç©¶ãæ°åŠçã«è§£æãããšãSDEææ³ ã§ãããªãã ODEç ã§ãããšçµè«ã¥ããããã®ã§ã¯ãªãããšèããã ãã® emoPulse ã«ããæŽæ°åã¯ã確ççãªæºãããšæéçãªæ»ãããã®åæ¹ãå
å
ããŠããããã®æ¯ãèã㯠SDE ãš ODE ã®å¢çã«äœçœ®ããç¬ç¹ã®æ§é ãæã€å¯èœæ§ãããã (Losså€ã¯åŠç¿ã®çµæã§ããããããããäžå¿ã«ããæ¬ææ³ã¯çµæããå°åºããã®ã§ ODEç ã«ãªããšäºæ³) Multi-EMA ã«ããå±¥æŽåœ¢æãå
éšå€æ°ã®æšç§»ããã©ã®ãããªé£ç¶æéçè§£éãæã¡ãããã¯ãä»åŸã®æ°åŠçç ç©¶ã«å§ããããéèŠãªèª²é¡ã§ããã æ¬çš¿ã§ã¯ãã®çŽæçãªæ¹åæ§ã®ã¿ã瀺ãããã®è©³çްãªè§£æã¯æªæ¥ã®ç ç©¶è
ã«ããçºå±ã«æåŸ
ãããã
|
| 567 |
+
â» æ¬çš¿ã«ããã SDE â DDE â ODE ãžã®çž®çŽããã»ã¹ã¯ãç©ççãªçŽæãšå®éšçäºå®ã«åºã¥ã仮説ã§ããã ãã®ç§»è¡ãå³å¯ãªæ°åŒã§èšè¿°ããäœæ¥ã¯æªæ¥ã®ç ç©¶è
ãã¡ã«å§ãããã emoPulse ãå»ãéŒåã®ãªãã«ãã©ã®ãããªæ°ããæ°åŠçç§©åºãé ãããŠããã®ãããã®äœçœãåããäœæ¥ãããçã®ïœ¢ã¢ãã«ãšã®å¯Ÿè©±ã®å§ãŸãã§ãããšä¿¡ããŠããã
|
| 568 |
+
|
| 569 |
+
|
| 570 |
+
åèæç® (References)
|
| 571 |
+
|
| 572 |
+
Kingma, D. P., & Ba, J. (2014). AdamïŒA Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980. (1次ã»2次ã¢ãŒã¡ã³ããçšããé©å¿çåŠç¿çã®åºç€)
|
| 573 |
+
|
| 574 |
+
Reddi, S. J., Kale, S., & Kumar, S. (2019). On the Convergence of Adam and Beyond. ICLR. (AMSGradçã«ããåæä¿èšŒãš2次ã¢ãŒã¡ã³ãã®å®å®æ§ã«é¢ããè°è«)
|
| 575 |
+
|
| 576 |
+
Defazio, A., & Mishchenko, K. (2023). Learning-Rate-Free Learning by D-Adaptation. ICML. (æé©è§£ãŸã§ã®è·é¢ D ãæšå®ããæåã®åŠç¿çèšå®ãäžèŠã«ããçè«çæ çµã¿)
|
| 577 |
+
|
| 578 |
+
Orabona, F., & Tommasi, T. (2017). Training Deep Networks without Learning Rates Through Coin Betting. NeurIPS. (COCOBïŒæè³æ¯ç (Betting) ã®æŠå¿µãçšããããã©ã¡ãŒã¿æŽæ°ã®èªåŸå¶åŸ¡çè«)
|
| 579 |
+
|
| 580 |
+
Luo, L., Xiong, Y., & Liu, Y. (2019). Adaptive Gradient Methods with Dynamic Bound of Learning Rate. ICLR. (AdaBoundïŒåŠç¿çã®åçã¯ãªããã³ã°ã«ããæ±åæ§èœã®åäž)
|
| 581 |
+
|
| 582 |
+
Shazeer, N., & Stern, M. (2018). AdafactorïŒAdaptive Learning Rates with Sublinear Memory Cost. ICML. (è¡ååè§£ã«ããã¡ã¢ãªç¯çŽãšãäœç²ŸåºŠç°å¢ã«ãããæ£èŠåææ³)
|
| 583 |
+
|
| 584 |
+
Bernstein, J., Wang, Y. X., Azizzadenesheli, K., & Anandkumar, A. (2018). signSGDïŒCompressed Optimisation for Non-Convex Problems. ICML. (笊å·åã«ããåŸé
å§çž®ãšããã€ãºèæ§ã®é«ãæŽæ°åã®èšŒæ)
|
| 585 |
+
|
| 586 |
+
Chen, S. B., et al. (2023). Symbolic Discovery of Optimization Algorithms. arXiv. (LionïŒç¬Šå·å (Sign) ãš Weight Decay ã®åé¢ã«ããå¹ççãªæ¢çŽ¢ã®èšå·ççºèŠ)
|
| 587 |
+
|
| 588 |
+
Zeyuan Allen-Zhu. (2017). NatashaïŒFaster Non-Convex Optimization Than SGD. arXiv. (髿¬¡æ
å ±ãå©çšããéåžæé©åã®å éãšãå±æè§£ããã®è±åºçè«)
|