Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -10,5 +10,43 @@ pinned: false
|
|
| 10 |
license: apache-2.0
|
| 11 |
short_description: 'SmolLM2-360M-Instruct '
|
| 12 |
---
|
|
|
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
license: apache-2.0
|
| 11 |
short_description: 'SmolLM2-360M-Instruct '
|
| 12 |
---
|
| 13 |
+
# SmolLM2 360M Instruct Demo
|
| 14 |
|
| 15 |
+
This Space demonstrates the SmolLM2-360M-Instruct model with a CPU fallback mechanism. It is designed to run efficiently even on the Hugging Face Free Tier (2 vCPUs).
|
| 16 |
+
|
| 17 |
+
## Overview
|
| 18 |
+
|
| 19 |
+
A minimal but production-ready LLM service built on:
|
| 20 |
+
* **Model:** SmolLM2-360M-Instruct (approx. 269MB, Apache 2.0).
|
| 21 |
+
* **Efficiency:** Optimized to run on 2 CPUs and minimum 2 GB RAM (HF tier supports up to 16 GB).
|
| 22 |
+
* **Scalability:** Perfect for local training and testing.
|
| 23 |
+
|
| 24 |
+
## Related Project: SmolLM2-customs
|
| 25 |
+
|
| 26 |
+
If you are interested in training small LLMs the lazy way, check out:
|
| 27 |
+
[https://github.com/VolkanSah/SmolLM2-customs](https://github.com/VolkanSah/SmolLM2-customs)
|
| 28 |
+
|
| 29 |
+
**Features of the custom implementation:**
|
| 30 |
+
* **FastAPI:** OpenAI-compatible `/v1/chat/completions` endpoint.
|
| 31 |
+
* **ADI (Anti-Dump Index):** Filters low-quality requests before they hit the model.
|
| 32 |
+
* **HF Dataset Integration:** Logs every request for later analysis and finetuning.
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## Deployment & Usage
|
| 37 |
+
|
| 38 |
+
You do not need an API key for this public demo, but rate limits apply.
|
| 39 |
+
|
| 40 |
+
### How to run your own instance:
|
| 41 |
+
1. **Duplicate/Clone** this Space.
|
| 42 |
+
2. **Environment Variables:** To use your own model access or private weights, add one of the following keys to your **Secrets**:
|
| 43 |
+
* `HF_TOKEN`
|
| 44 |
+
* `TEST_TOKEN`
|
| 45 |
+
* `HUGGINGFACE_TOKEN`
|
| 46 |
+
* `HF_API_TOKEN`
|
| 47 |
+
|
| 48 |
+
The code uses a flexible token resolution logic to ensure compatibility with older or custom keys.
|
| 49 |
+
|
| 50 |
+
## Technical Details
|
| 51 |
+
|
| 52 |
+
The inference pipeline uses `transformers` with `torch`. It automatically detects if a GPU is available; otherwise, it falls back to CPU execution without breaking the Gradio interface.
|