Spaces:

codey-lab
/

SmolLM2-360M-Instruct

Running

App Files Files Community

Alibrown commited on Mar 18

Commit

e872955

verified ·

1 Parent(s): 6549bb3

Update README.md

Browse files

Files changed (1) hide show

README.md +39 -1

README.md CHANGED Viewed

@@ -10,5 +10,43 @@ pinned: false
 license: apache-2.0
 short_description: 'SmolLM2-360M-Instruct '
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: apache-2.0
 short_description: 'SmolLM2-360M-Instruct '
 ---
+# SmolLM2 360M Instruct Demo
+This Space demonstrates the SmolLM2-360M-Instruct model with a CPU fallback mechanism. It is designed to run efficiently even on the Hugging Face Free Tier (2 vCPUs).
+## Overview
+A minimal but production-ready LLM service built on:
+* **Model:** SmolLM2-360M-Instruct (approx. 269MB, Apache 2.0).
+* **Efficiency:** Optimized to run on 2 CPUs and minimum 2 GB RAM (HF tier supports up to 16 GB).
+* **Scalability:** Perfect for local training and testing.
+## Related Project: SmolLM2-customs
+If you are interested in training small LLMs the lazy way, check out:
+[https://github.com/VolkanSah/SmolLM2-customs](https://github.com/VolkanSah/SmolLM2-customs)
+**Features of the custom implementation:**
+* **FastAPI:** OpenAI-compatible `/v1/chat/completions` endpoint.
+* **ADI (Anti-Dump Index):** Filters low-quality requests before they hit the model.
+* **HF Dataset Integration:** Logs every request for later analysis and finetuning.
+---
+## Deployment & Usage
+You do not need an API key for this public demo, but rate limits apply.
+### How to run your own instance:
+1. **Duplicate/Clone** this Space.
+2. **Environment Variables:** To use your own model access or private weights, add one of the following keys to your **Secrets**:
+   * `HF_TOKEN`
+   * `TEST_TOKEN`
+   * `HUGGINGFACE_TOKEN`
+   * `HF_API_TOKEN`
+The code uses a flexible token resolution logic to ensure compatibility with older or custom keys.
+## Technical Details
+The inference pipeline uses `transformers` with `torch`. It automatically detects if a GPU is available; otherwise, it falls back to CPU execution without breaking the Gradio interface.