prometechinc commited on
Commit
54f04d7
·
verified ·
1 Parent(s): a4382c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -50,6 +50,20 @@ By placing these files on an SD card or loading them via SPIFFS/LittleFS, you ca
50
 
51
  ---
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  # BCE Architecture Project: Final Success Report Simulation
54
 
55
  ## 1. Executive Summary
 
50
 
51
  ---
52
 
53
+ ### **Model Architecture & Configuration**
54
+
55
+ **Asena_ESP32** is a highly compact Transformer model based on the **LLaMA (LlamaForCausalLM)** architecture, specifically optimized for extreme edge deployment. Despite its ultra-small footprint, the model incorporates modern design choices to maximize efficiency, stability, and expressive capability within tight hardware constraints.
56
+
57
+ The model features **8 Transformer layers** with a **hidden size of 64** and **8 attention heads** (with 4 key-value heads for efficiency). Each head operates with a **dimension of 26**, enabling lightweight multi-head attention while maintaining reasonable representational capacity. The feed-forward network uses an **intermediate size of 208** with **SiLU activation**, balancing non-linearity and computational cost. Both attention and MLP layers include bias terms, and minimal dropout (~0.0027) is applied to stabilize training without harming convergence in such a small model.
58
+
59
+ For positional encoding, Asena_ESP32 uses an advanced **RoPE (Rotary Positional Embedding)** configuration inspired by LLaMA 3, with extended scaling parameters (factor: 256) to improve positional generalization beyond its base context. The model supports a **maximum sequence length of 128 tokens**, making it suitable for short, structured interactions typical in embedded systems. It uses **RMSNorm** with a finely tuned epsilon for numerical stability and shares input-output embeddings to reduce parameter count.
60
+
61
+ The tokenizer operates with a **vocabulary size of 8,766 tokens**, and special tokens are defined for padding (8000), beginning-of-sequence (8001), and end-of-sequence (8002). The model is trained and executed in **float32 precision**, with caching disabled to reduce memory overhead—aligning with its goal of running efficiently on constrained devices such as ESP32.
62
+
63
+ Overall, this configuration reflects a deliberate trade-off: sacrificing large-scale knowledge capacity in favor of **speed, determinism, and deployability at the extreme edge**.
64
+
65
+ ---
66
+
67
  # BCE Architecture Project: Final Success Report Simulation
68
 
69
  ## 1. Executive Summary