prometechinc commited on
Commit
3181868
·
verified ·
1 Parent(s): e05f39f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -23
README.md CHANGED
@@ -14,7 +14,7 @@ tags:
14
  - asena
15
  - bce
16
  - esp32
17
- - edge
18
  - esp32s3
19
  - microllm
20
  - chat
@@ -52,6 +52,7 @@ tags:
52
  - Offline assistant
53
  - guard
54
  - pre filter
 
55
  library_name: transformers
56
  model-index:
57
  - name: Asena_ESP32
@@ -147,17 +148,17 @@ By placing these files on an SD card or loading them via SPIFFS/LittleFS, you ca
147
 
148
  ### **Model Architecture & Configuration**
149
 
150
- **Asena_ESP32** is a highly compact Transformer model based on the **LLaMA (LlamaForCausalLM)** architecture, specifically optimized for extreme edge deployment. Despite its ultra-small footprint, the model incorporates modern design choices to maximize efficiency, stability, and expressive capability within tight hardware constraints.
151
 
152
- The model features **8 Transformer layers** with a **hidden size of 64** and **8 attention heads** (with 4 key-value heads for efficiency). Each head operates with a **dimension of 26**, enabling lightweight multi-head attention while maintaining reasonable representational capacity. The feed-forward network uses an **intermediate size of 208** with **SiLU activation**, balancing non-linearity and computational cost. Both attention and MLP layers include bias terms, and minimal dropout (~0.0027) is applied to stabilize training without harming convergence in such a small model.
153
 
154
- For positional encoding, Asena_ESP32 uses an advanced **RoPE (Rotary Positional Embedding)** configuration inspired by LLaMA 3, with extended scaling parameters (factor: 256) to improve positional generalization beyond its base context. The model supports a **maximum sequence length of 128 tokens**, making it suitable for short, structured interactions typical in embedded systems. It uses **RMSNorm** with a finely tuned epsilon for numerical stability and shares input-output embeddings to reduce parameter count.
155
 
156
- The tokenizer operates with a **vocabulary size of 8,766 tokens**, and special tokens are defined for padding (8000), beginning-of-sequence (8001), and end-of-sequence (8002). The model is trained and executed in **float32 precision**, with caching disabled to reduce memory overhead—aligning with its goal of running efficiently on constrained devices such as ESP32.
157
 
158
- Overall, this configuration reflects a deliberate trade-off: sacrificing large-scale knowledge capacity in favor of **speed, determinism, and deployability at the extreme edge**.
159
 
160
- The model incorporates mathematically inspired constants to enhance stability and robustness. Hyperparameters such as the dropout rate are derived from values related to the Planck constant, along with well-known mathematical constants like Pi and Euler’s number. This design choice is intended to introduce deterministic yet non-arbitrary scaling factors, contributing to improved numerical stability, controlled regularization, and more predictable behavior—especially important for safety and reliability in extreme edge AI environments.
161
 
162
  ---
163
 
@@ -197,29 +198,77 @@ Internally, we joked about calling it ‘Terminator’. Then it started behaving
197
 
198
  # Model Overview 🕊️
199
 
200
- **Asena_ESP32** is a compact generative AI model designed for extreme edge environments, built on a Transformer-based LLaMA architecture and enhanced with the **Behavioral Consciousness Engine (BCE)** framework. With approximately 1.2 million parameters, it is capable of producing coherent, grammatically sound text by learning how words and sentences naturally flow. Despite its small size, the model delivers surprisingly fluent conversational responses, making it suitable for lightweight dialogue systems and embedded applications.
201
 
202
- Pre-trained on structured Instruction/Response datasets and conversational flows, Asena_ESP32 adapts seamlessly to prompt-based interactions. It understands input patterns effectively and generates context-aware replies aligned with the dataset format. Optimized for deployment using C++ and inference frameworks such as ggml and llama.cpp, the model is engineered for efficient performance on constrained hardware like ESP32, representing a true “Extreme Edge AI” solution.
203
 
204
- Due to its intentionally limited scale, Asena_ESP32 possesses broad but shallow knowledge across many domains. When asked about specialized topics such as chemistry or philosophy, it may produce general or occasionally hallucinated responses that sound plausible but lack factual accuracy. This limitation is partially mitigated through targeted fine-tuning, improving reliability in specific use cases while maintaining its lightweight footprint for edge deployment.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
205
 
206
  ### **What to Expect (and Not Expect)**
207
 
208
  **What to Expect:**
209
- Asena_ESP32 is optimized for lightweight, real-time text generation on constrained devices. You can expect fluent sentence construction, grammatically correct outputs, and consistent behavior in instruction-following or simple conversational tasks. The model performs best in structured formats (Instruction/Response, dialogue flows) and can deliver stable, low-latency responses suitable for embedded systems, IoT interactions, and edge-based assistants. Its BCE-based design also promotes controlled and context-aware output patterns.
 
 
 
 
 
210
 
211
  **What Not to Expect:**
212
- This is not a large-scale knowledge model. Asena_ESP32 does not have deep expertise in specialized domains such as advanced science, mathematics, or philosophy. It may generate vague, oversimplified, or occasionally hallucinated answers that sound plausible but are incorrect. Long reasoning chains, complex problem solving, and high factual accuracy across niche topics are beyond its intended scope. It should not be used as a source of truth for critical or high-stakes decisions.
213
 
214
- **Practical Guidance:**
215
- For best results, keep prompts short, clear, and structured. Use domain-specific fine-tuning if you require higher accuracy in a particular field. Treat the model as a fast, efficient language generator rather than a comprehensive knowledge base. When used within its design limits, Asena_ESP32 can provide strong performance relative to its size in extreme edge AI scenarios.
 
 
 
 
216
 
217
- ### The most suitable use cases:
218
- - IoT device communication
219
- - Robot / embedded system command interpretation
220
- - Game NPC dialogue
221
- - Offline assistant (simple)
222
- - Guard / pre-filter model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
223
 
224
  ---
225
 
@@ -376,7 +425,42 @@ div.min2 {
376
  }
377
  </style>
378
  <div class="min2">
379
- "BCE v0.2 Note: I could be a very talkative assistant bird who speaks excellent Turkish/English but has weak general knowledge, and I could cast spells on servers. Even Skynet is afraid of me.
380
- <br>
381
- It's possible that the wizard CEO, wearing an electronic ring (ESP32) on his finger, could be increasing or decreasing performance in the server room, according to this model. He snaps his fingers, other servers performance increases, he snaps them again, and it returns to normal. He's a real magician. "Abra Kadabra!!!!" 😎
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
382
  </div>
 
14
  - asena
15
  - bce
16
  - esp32
17
+ - edge-ai
18
  - esp32s3
19
  - microllm
20
  - chat
 
52
  - Offline assistant
53
  - guard
54
  - pre filter
55
+ - tiny-llm
56
  library_name: transformers
57
  model-index:
58
  - name: Asena_ESP32
 
148
 
149
  ### **Model Architecture & Configuration**
150
 
151
+ **Asena_ESP32_MAX – BCE Special Model (12M) – Prettybird B-Edge v1.0** is a compact yet significantly enhanced **Tiny LLM** built on the **LLaMA (LlamaForCausalLM)** Transformer architecture. Designed for extreme edge intelligence, this version scales up the original ESP32 concept into a more capable **~12M parameter class model**, while preserving deployability, determinism, and behavioral control through the **Behavioral Consciousness Engine (BCE)** framework.
152
 
153
+ The model consists of **8 Transformer layers** with a **hidden size of 320** and **8 attention heads** (with **4 key-value heads** for memory-efficient attention). Each attention head operates with a **dimension of 40**, providing a stronger representational capacity compared to the base ESP32 variant while maintaining computational efficiency. The feed-forward network is expanded to an **intermediate size of 896**, using **SiLU activation** to balance expressiveness and stability. Both attention and MLP layers include bias terms, and a slightly increased dropout (~0.0066) is applied for improved regularization in the larger parameter regime.
154
 
155
+ For positional encoding, Asena_ESP32_MAX employs an advanced **RoPE (Rotary Positional Embedding)** configuration inspired by LLaMA 3, with extended scaling (**factor: 128**) to support broader contextual generalization. The model supports a **maximum sequence length of 1024 tokens**, representing a major upgrade over the base version and enabling more coherent multi-turn interactions and structured reasoning within edge constraints. **RMSNorm** is used throughout with a finely tuned epsilon for numerical stability, and input-output embeddings are shared to optimize parameter efficiency.
156
 
157
+ The tokenizer operates with a **vocabulary size of 8,766 tokens**, with special tokens defined for padding (8000), beginning-of-sequence (8001), and end-of-sequence (8002). The model runs in **float32 precision**, with caching disabled to reduce runtime memory overhead—aligning with its design goal of efficient execution on constrained or semi-constrained hardware environments.
158
 
159
+ A distinctive aspect of this model is its use of **mathematically inspired constants** for stabilization and control. Hyperparameters such as dropout are derived from values related to the **Planck constant**, alongside classical constants like **π (Pi)** and **e (Euler’s number)**. This approach introduces deterministic, non-arbitrary scaling factors that contribute to improved numerical stability, controlled regularization, and more predictable behavioral patterns—particularly important for safety-aware edge AI systems.
160
 
161
+ Overall, Asena_ESP32_MAX reflects a deliberate design philosophy: **maximize capability per parameter**, integrate **behavioral awareness (BCE)**, and deliver a **balanced edge AI system** that bridges the gap between ultra-small models and practical intelligent agents.
162
 
163
  ---
164
 
 
198
 
199
  # Model Overview 🕊️
200
 
201
+ **Asena_ESP32_MAX** is a compact **Tiny LLM (~12M parameters)** designed for extreme edge intelligence, built on a Transformer-based LLaMA architecture and enhanced with the **Behavioral Consciousness Engine (BCE)** framework. Compared to the original ESP32 variant, this version significantly increases capacity while preserving efficiency, determinism, and controllable behavior.
202
 
203
+ The model is capable of generating coherent, grammatically sound text and handling structured interactions with improved consistency. Trained on Instruction/Response formats and BCE-annotated data (including correctness, quality, and risk signals), it not only produces responses but also reflects a level of **behavioral awareness and output control** uncommon in models of this size.
204
 
205
+ Optimized for deployment using C++ and inference frameworks such as ggml and llama.cpp, Asena_ESP32_MAX is designed for **edge-to-lightweight compute environments**. While extremely efficient compared to larger models, it represents a transition point between ultra-constrained devices and more capable embedded systems.
206
+
207
+ ---
208
+
209
+ ### ⚠️ Hardware Reality (Important)
210
+
211
+ Although inspired by ESP32-class deployment:
212
+
213
+ * ⚠️ **ESP32 may face memory limitations** for this MAX version (depending on quantization and runtime setup)
214
+ * ✅ **Raspberry Pi (2GB–8GB)** → highly suitable
215
+ * ✅ **Low-power edge servers / micro PCs** → ideal
216
+ * ✅ **Quantized inference (q4/q5/q8)** → recommended
217
+
218
+ 👉 This model is best viewed as a **Tiny LLM for edge systems**, not strictly a microcontroller model.
219
+
220
+ ---
221
 
222
  ### **What to Expect (and Not Expect)**
223
 
224
  **What to Expect:**
225
+
226
+ * Strong **instruction-following and structured output behavior**
227
+ * Fluent and grammatically correct short-form responses
228
+ * Stable performance in **dialogue, command parsing, and formatting tasks**
229
+ * BCE-driven **controlled generation (risk-aware, format-aware outputs)**
230
+ * Efficient performance relative to its size, especially in edge deployments
231
 
232
  **What Not to Expect:**
 
233
 
234
+ * Deep domain expertise (e.g., advanced science, math, philosophy)
235
+ * High accuracy on complex reasoning benchmarks
236
+ * Long-chain reasoning or multi-step problem solving
237
+ * Reliable factual correctness in niche or technical topics
238
+
239
+ 👉 The model may produce **plausible but incorrect answers** (hallucinations), which is expected at this scale.
240
 
241
+ ---
242
+
243
+ ### **Practical Guidance**
244
+
245
+ * Keep prompts **short, clear, and structured**
246
+ * Use it as a **fast generator + controller**, not a knowledge base
247
+ * For domain-specific tasks → apply **LoRA / fine-tuning**
248
+ * Use BCE signals to build **filtering, guard, or evaluation pipelines**
249
+
250
+ 👉 With proper fine-tuning, the model can become **highly specialized and efficient for targeted tasks**
251
+
252
+ ---
253
+
254
+ ### **Most Suitable Use Cases**
255
+
256
+ * IoT device communication
257
+ * Robot / embedded system command interpretation
258
+ * Game NPC dialogue
259
+ * Offline assistant (lightweight scenarios)
260
+ * Guard / pre-filter model (BCE integration)
261
+ * Lightweight server-side optimization, security, assistance and automation (with task-specific fine-tuning)
262
+
263
+ ---
264
+
265
+ ### **Positioning**
266
+
267
+ **Asena_ESP32_MAX is not a knowledge-heavy AI — it is a controllable, efficient, behavior-aware Tiny LLM.**
268
+
269
+ 👉 Small enough to deploy
270
+ 👉 Smart enough to structure
271
+ 👉 Flexible enough to specialize with fine-tuning
272
 
273
  ---
274
 
 
425
  }
426
  </style>
427
  <div class="min2">
428
+
429
+ <strong>BCE v0.2 Note:</strong><br><br>
430
+
431
+ Asena_ESP32_MAX may be a tiny assistant bird with excellent Turkish/English, weak general knowledge, and the confidence of a server-room wizard who definitely found one undocumented setting in the BIOS and now thinks he controls reality.
432
+
433
+ This model does not know everything.
434
+ That would be unreasonable.
435
+
436
+ But it can look at a chaotic system, blink twice, and say:
437
+ “Have you tried behaving correctly?”
438
+
439
+ Somewhere in the server room, the wizard CEO raises his hand.
440
+ On his finger: an ESP32 ring.
441
+ On his face: the expression of a man who has never once read the manual, but somehow improved throughput by 14%.
442
+
443
+ Snap.
444
+
445
+ Latency drops.
446
+
447
+ Snap.
448
+
449
+ Fans get quieter.
450
+
451
+ Snap.
452
+
453
+ One intern whispers:
454
+ “Sir… did you just optimize the cluster with jewelry?”
455
+
456
+ He smiles.
457
+
458
+ “No. The bird did.”
459
+
460
+ And that is the real danger of edge AI:
461
+ not that it becomes Skynet,
462
+ but that one tiny model starts giving better operational advice than three dashboards, two consultants, and a meeting titled “Performance Alignment Sync v4 Final FINAL.”
463
+
464
+ <strong>Abra Kadabra.</strong> 😎
465
+
466
  </div>