Spaces:
Runtime error
Runtime error
Create README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,24 @@
|
|
| 1 |
---
|
| 2 |
-
title: Graniteapi
|
| 3 |
-
emoji: 📉
|
| 4 |
-
colorFrom: yellow
|
| 5 |
-
colorTo: green
|
| 6 |
sdk: docker
|
| 7 |
-
|
|
|
|
| 8 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
sdk: docker
|
| 3 |
+
emoji: ⚡
|
| 4 |
+
colorTo: indigo
|
| 5 |
---
|
| 6 |
+
# ⚡ Ultralekki OpenAI-Compatible Server
|
| 7 |
+
- **Model**: `unsloth/granite-4.1-3b-GGUF` (kwantyzacja `UD-IQ2_M` ~1.1GB)
|
| 8 |
+
- **Format**: Pełna kompatybilność z `/v1/chat/completions`, `/v1/completions`, streaming SSE
|
| 9 |
+
- **Auth**: Brak (otwarty dostęp)
|
| 10 |
+
- **Optymalizacja**: CPU-only, mmap, limit kontekstu 2048, max 3 równoczesne requesty
|
| 11 |
+
- **Resilience**: Obsługa zerwanych połączeń, keep-alive 120s, graceful shutdown
|
| 12 |
|
| 13 |
+
## 🌐 Endpointy
|
| 14 |
+
- `POST /v1/chat/completions`
|
| 15 |
+
- `GET /health`
|
| 16 |
+
- `GET /docs` (Swagger)
|
| 17 |
+
|
| 18 |
+
## ⚙️ Zmienne środowiskowe (opcjonalne)
|
| 19 |
+
| Zmienna | Domyślnie | Opis |
|
| 20 |
+
|--------|-----------|------|
|
| 21 |
+
| `N_CTX` | `2048` | Maksymalny kontekst |
|
| 22 |
+
| `N_THREADS` | `2` | Wątki CPU |
|
| 23 |
+
| `MAX_CONCURRENCY` | `3` | Limit równoczesnych zapytań |
|
| 24 |
+
| `MODEL_FILE` | `granite-4.1-3b-UD-IQ2_M.gguf` | Nazwa pliku GGUF |
|