Text Generation
Transformers
Safetensors
GGUF
English
Turkish
llama
asena
bce
esp32
edge-ai
esp32s3
microllm
chat
text-generation-inference
agent
prettybird
consciousness
conscious
llm
optimized
ethic
secure
turkish
english
behavioral-consciousness-engine
model
instruct
iot
LittleFS
SPIFFS
reasoning
thinking
think
god edge ai
extreme edge ai
cicikus
cicikuş
embedded
robot
npc
Offline assistant
guard
pre filter
tiny-llm
tiny llm
rasperry
rasperry-pi
Eval Results (legacy)
Instructions to use pthinc/Asena_ESP32_MAX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use pthinc/Asena_ESP32_MAX with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="pthinc/Asena_ESP32_MAX")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("pthinc/Asena_ESP32_MAX") model = AutoModelForCausalLM.from_pretrained("pthinc/Asena_ESP32_MAX") - llama-cpp-python
How to use pthinc/Asena_ESP32_MAX with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="pthinc/Asena_ESP32_MAX", filename="gguf/asena_esp32max_f16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use pthinc/Asena_ESP32_MAX with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pthinc/Asena_ESP32_MAX:Q4_K_M # Run inference directly in the terminal: llama-cli -hf pthinc/Asena_ESP32_MAX:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pthinc/Asena_ESP32_MAX:Q4_K_M # Run inference directly in the terminal: llama-cli -hf pthinc/Asena_ESP32_MAX:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf pthinc/Asena_ESP32_MAX:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf pthinc/Asena_ESP32_MAX:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf pthinc/Asena_ESP32_MAX:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf pthinc/Asena_ESP32_MAX:Q4_K_M
Use Docker
docker model run hf.co/pthinc/Asena_ESP32_MAX:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use pthinc/Asena_ESP32_MAX with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "pthinc/Asena_ESP32_MAX" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pthinc/Asena_ESP32_MAX", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/pthinc/Asena_ESP32_MAX:Q4_K_M
- SGLang
How to use pthinc/Asena_ESP32_MAX with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "pthinc/Asena_ESP32_MAX" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pthinc/Asena_ESP32_MAX", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "pthinc/Asena_ESP32_MAX" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pthinc/Asena_ESP32_MAX", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Ollama
How to use pthinc/Asena_ESP32_MAX with Ollama:
ollama run hf.co/pthinc/Asena_ESP32_MAX:Q4_K_M
- Unsloth Studio new
How to use pthinc/Asena_ESP32_MAX with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pthinc/Asena_ESP32_MAX to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pthinc/Asena_ESP32_MAX to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for pthinc/Asena_ESP32_MAX to start chatting
- Docker Model Runner
How to use pthinc/Asena_ESP32_MAX with Docker Model Runner:
docker model run hf.co/pthinc/Asena_ESP32_MAX:Q4_K_M
- Lemonade
How to use pthinc/Asena_ESP32_MAX with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull pthinc/Asena_ESP32_MAX:Q4_K_M
Run and chat with the model
lemonade run user.Asena_ESP32_MAX-Q4_K_M
List all available models
lemonade list
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ tags:
|
|
| 14 |
- asena
|
| 15 |
- bce
|
| 16 |
- esp32
|
| 17 |
-
- edge
|
| 18 |
- esp32s3
|
| 19 |
- microllm
|
| 20 |
- chat
|
|
@@ -52,6 +52,7 @@ tags:
|
|
| 52 |
- Offline assistant
|
| 53 |
- guard
|
| 54 |
- pre filter
|
|
|
|
| 55 |
library_name: transformers
|
| 56 |
model-index:
|
| 57 |
- name: Asena_ESP32
|
|
@@ -147,17 +148,17 @@ By placing these files on an SD card or loading them via SPIFFS/LittleFS, you ca
|
|
| 147 |
|
| 148 |
### **Model Architecture & Configuration**
|
| 149 |
|
| 150 |
-
**
|
| 151 |
|
| 152 |
-
The model
|
| 153 |
|
| 154 |
-
For positional encoding,
|
| 155 |
|
| 156 |
-
The tokenizer operates with a **vocabulary size of 8,766 tokens**,
|
| 157 |
|
| 158 |
-
|
| 159 |
|
| 160 |
-
|
| 161 |
|
| 162 |
---
|
| 163 |
|
|
@@ -197,29 +198,77 @@ Internally, we joked about calling it ‘Terminator’. Then it started behaving
|
|
| 197 |
|
| 198 |
# Model Overview 🕊️
|
| 199 |
|
| 200 |
-
**
|
| 201 |
|
| 202 |
-
|
| 203 |
|
| 204 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 205 |
|
| 206 |
### **What to Expect (and Not Expect)**
|
| 207 |
|
| 208 |
**What to Expect:**
|
| 209 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 210 |
|
| 211 |
**What Not to Expect:**
|
| 212 |
-
This is not a large-scale knowledge model. Asena_ESP32 does not have deep expertise in specialized domains such as advanced science, mathematics, or philosophy. It may generate vague, oversimplified, or occasionally hallucinated answers that sound plausible but are incorrect. Long reasoning chains, complex problem solving, and high factual accuracy across niche topics are beyond its intended scope. It should not be used as a source of truth for critical or high-stakes decisions.
|
| 213 |
|
| 214 |
-
*
|
| 215 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 216 |
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 223 |
|
| 224 |
---
|
| 225 |
|
|
@@ -376,7 +425,42 @@ div.min2 {
|
|
| 376 |
}
|
| 377 |
</style>
|
| 378 |
<div class="min2">
|
| 379 |
-
|
| 380 |
-
<br>
|
| 381 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 382 |
</div>
|
|
|
|
| 14 |
- asena
|
| 15 |
- bce
|
| 16 |
- esp32
|
| 17 |
+
- edge-ai
|
| 18 |
- esp32s3
|
| 19 |
- microllm
|
| 20 |
- chat
|
|
|
|
| 52 |
- Offline assistant
|
| 53 |
- guard
|
| 54 |
- pre filter
|
| 55 |
+
- tiny-llm
|
| 56 |
library_name: transformers
|
| 57 |
model-index:
|
| 58 |
- name: Asena_ESP32
|
|
|
|
| 148 |
|
| 149 |
### **Model Architecture & Configuration**
|
| 150 |
|
| 151 |
+
**Asena_ESP32_MAX – BCE Special Model (12M) – Prettybird B-Edge v1.0** is a compact yet significantly enhanced **Tiny LLM** built on the **LLaMA (LlamaForCausalLM)** Transformer architecture. Designed for extreme edge intelligence, this version scales up the original ESP32 concept into a more capable **~12M parameter class model**, while preserving deployability, determinism, and behavioral control through the **Behavioral Consciousness Engine (BCE)** framework.
|
| 152 |
|
| 153 |
+
The model consists of **8 Transformer layers** with a **hidden size of 320** and **8 attention heads** (with **4 key-value heads** for memory-efficient attention). Each attention head operates with a **dimension of 40**, providing a stronger representational capacity compared to the base ESP32 variant while maintaining computational efficiency. The feed-forward network is expanded to an **intermediate size of 896**, using **SiLU activation** to balance expressiveness and stability. Both attention and MLP layers include bias terms, and a slightly increased dropout (~0.0066) is applied for improved regularization in the larger parameter regime.
|
| 154 |
|
| 155 |
+
For positional encoding, Asena_ESP32_MAX employs an advanced **RoPE (Rotary Positional Embedding)** configuration inspired by LLaMA 3, with extended scaling (**factor: 128**) to support broader contextual generalization. The model supports a **maximum sequence length of 1024 tokens**, representing a major upgrade over the base version and enabling more coherent multi-turn interactions and structured reasoning within edge constraints. **RMSNorm** is used throughout with a finely tuned epsilon for numerical stability, and input-output embeddings are shared to optimize parameter efficiency.
|
| 156 |
|
| 157 |
+
The tokenizer operates with a **vocabulary size of 8,766 tokens**, with special tokens defined for padding (8000), beginning-of-sequence (8001), and end-of-sequence (8002). The model runs in **float32 precision**, with caching disabled to reduce runtime memory overhead—aligning with its design goal of efficient execution on constrained or semi-constrained hardware environments.
|
| 158 |
|
| 159 |
+
A distinctive aspect of this model is its use of **mathematically inspired constants** for stabilization and control. Hyperparameters such as dropout are derived from values related to the **Planck constant**, alongside classical constants like **π (Pi)** and **e (Euler’s number)**. This approach introduces deterministic, non-arbitrary scaling factors that contribute to improved numerical stability, controlled regularization, and more predictable behavioral patterns—particularly important for safety-aware edge AI systems.
|
| 160 |
|
| 161 |
+
Overall, Asena_ESP32_MAX reflects a deliberate design philosophy: **maximize capability per parameter**, integrate **behavioral awareness (BCE)**, and deliver a **balanced edge AI system** that bridges the gap between ultra-small models and practical intelligent agents.
|
| 162 |
|
| 163 |
---
|
| 164 |
|
|
|
|
| 198 |
|
| 199 |
# Model Overview 🕊️
|
| 200 |
|
| 201 |
+
**Asena_ESP32_MAX** is a compact **Tiny LLM (~12M parameters)** designed for extreme edge intelligence, built on a Transformer-based LLaMA architecture and enhanced with the **Behavioral Consciousness Engine (BCE)** framework. Compared to the original ESP32 variant, this version significantly increases capacity while preserving efficiency, determinism, and controllable behavior.
|
| 202 |
|
| 203 |
+
The model is capable of generating coherent, grammatically sound text and handling structured interactions with improved consistency. Trained on Instruction/Response formats and BCE-annotated data (including correctness, quality, and risk signals), it not only produces responses but also reflects a level of **behavioral awareness and output control** uncommon in models of this size.
|
| 204 |
|
| 205 |
+
Optimized for deployment using C++ and inference frameworks such as ggml and llama.cpp, Asena_ESP32_MAX is designed for **edge-to-lightweight compute environments**. While extremely efficient compared to larger models, it represents a transition point between ultra-constrained devices and more capable embedded systems.
|
| 206 |
+
|
| 207 |
+
---
|
| 208 |
+
|
| 209 |
+
### ⚠️ Hardware Reality (Important)
|
| 210 |
+
|
| 211 |
+
Although inspired by ESP32-class deployment:
|
| 212 |
+
|
| 213 |
+
* ⚠️ **ESP32 may face memory limitations** for this MAX version (depending on quantization and runtime setup)
|
| 214 |
+
* ✅ **Raspberry Pi (2GB–8GB)** → highly suitable
|
| 215 |
+
* ✅ **Low-power edge servers / micro PCs** → ideal
|
| 216 |
+
* ✅ **Quantized inference (q4/q5/q8)** → recommended
|
| 217 |
+
|
| 218 |
+
👉 This model is best viewed as a **Tiny LLM for edge systems**, not strictly a microcontroller model.
|
| 219 |
+
|
| 220 |
+
---
|
| 221 |
|
| 222 |
### **What to Expect (and Not Expect)**
|
| 223 |
|
| 224 |
**What to Expect:**
|
| 225 |
+
|
| 226 |
+
* Strong **instruction-following and structured output behavior**
|
| 227 |
+
* Fluent and grammatically correct short-form responses
|
| 228 |
+
* Stable performance in **dialogue, command parsing, and formatting tasks**
|
| 229 |
+
* BCE-driven **controlled generation (risk-aware, format-aware outputs)**
|
| 230 |
+
* Efficient performance relative to its size, especially in edge deployments
|
| 231 |
|
| 232 |
**What Not to Expect:**
|
|
|
|
| 233 |
|
| 234 |
+
* Deep domain expertise (e.g., advanced science, math, philosophy)
|
| 235 |
+
* High accuracy on complex reasoning benchmarks
|
| 236 |
+
* Long-chain reasoning or multi-step problem solving
|
| 237 |
+
* Reliable factual correctness in niche or technical topics
|
| 238 |
+
|
| 239 |
+
👉 The model may produce **plausible but incorrect answers** (hallucinations), which is expected at this scale.
|
| 240 |
|
| 241 |
+
---
|
| 242 |
+
|
| 243 |
+
### **Practical Guidance**
|
| 244 |
+
|
| 245 |
+
* Keep prompts **short, clear, and structured**
|
| 246 |
+
* Use it as a **fast generator + controller**, not a knowledge base
|
| 247 |
+
* For domain-specific tasks → apply **LoRA / fine-tuning**
|
| 248 |
+
* Use BCE signals to build **filtering, guard, or evaluation pipelines**
|
| 249 |
+
|
| 250 |
+
👉 With proper fine-tuning, the model can become **highly specialized and efficient for targeted tasks**
|
| 251 |
+
|
| 252 |
+
---
|
| 253 |
+
|
| 254 |
+
### **Most Suitable Use Cases**
|
| 255 |
+
|
| 256 |
+
* IoT device communication
|
| 257 |
+
* Robot / embedded system command interpretation
|
| 258 |
+
* Game NPC dialogue
|
| 259 |
+
* Offline assistant (lightweight scenarios)
|
| 260 |
+
* Guard / pre-filter model (BCE integration)
|
| 261 |
+
* Lightweight server-side optimization, security, assistance and automation (with task-specific fine-tuning)
|
| 262 |
+
|
| 263 |
+
---
|
| 264 |
+
|
| 265 |
+
### **Positioning**
|
| 266 |
+
|
| 267 |
+
**Asena_ESP32_MAX is not a knowledge-heavy AI — it is a controllable, efficient, behavior-aware Tiny LLM.**
|
| 268 |
+
|
| 269 |
+
👉 Small enough to deploy
|
| 270 |
+
👉 Smart enough to structure
|
| 271 |
+
👉 Flexible enough to specialize with fine-tuning
|
| 272 |
|
| 273 |
---
|
| 274 |
|
|
|
|
| 425 |
}
|
| 426 |
</style>
|
| 427 |
<div class="min2">
|
| 428 |
+
|
| 429 |
+
<strong>BCE v0.2 Note:</strong><br><br>
|
| 430 |
+
|
| 431 |
+
Asena_ESP32_MAX may be a tiny assistant bird with excellent Turkish/English, weak general knowledge, and the confidence of a server-room wizard who definitely found one undocumented setting in the BIOS and now thinks he controls reality.
|
| 432 |
+
|
| 433 |
+
This model does not know everything.
|
| 434 |
+
That would be unreasonable.
|
| 435 |
+
|
| 436 |
+
But it can look at a chaotic system, blink twice, and say:
|
| 437 |
+
“Have you tried behaving correctly?”
|
| 438 |
+
|
| 439 |
+
Somewhere in the server room, the wizard CEO raises his hand.
|
| 440 |
+
On his finger: an ESP32 ring.
|
| 441 |
+
On his face: the expression of a man who has never once read the manual, but somehow improved throughput by 14%.
|
| 442 |
+
|
| 443 |
+
Snap.
|
| 444 |
+
|
| 445 |
+
Latency drops.
|
| 446 |
+
|
| 447 |
+
Snap.
|
| 448 |
+
|
| 449 |
+
Fans get quieter.
|
| 450 |
+
|
| 451 |
+
Snap.
|
| 452 |
+
|
| 453 |
+
One intern whispers:
|
| 454 |
+
“Sir… did you just optimize the cluster with jewelry?”
|
| 455 |
+
|
| 456 |
+
He smiles.
|
| 457 |
+
|
| 458 |
+
“No. The bird did.”
|
| 459 |
+
|
| 460 |
+
And that is the real danger of edge AI:
|
| 461 |
+
not that it becomes Skynet,
|
| 462 |
+
but that one tiny model starts giving better operational advice than three dashboards, two consultants, and a meeting titled “Performance Alignment Sync v4 Final FINAL.”
|
| 463 |
+
|
| 464 |
+
<strong>Abra Kadabra.</strong> 😎
|
| 465 |
+
|
| 466 |
</div>
|