Update README.md
Browse files
README.md
CHANGED
|
@@ -40,7 +40,26 @@ HEBATRON is designed to handle the structural and morphological complexities of
|
|
| 40 |
| **Precision** | FP8 Mixed-Precision |
|
| 41 |
|
| 42 |
---
|
|
|
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
## 🧬 Training Curriculum
|
| 45 |
The model was trained using a three-phase **Curriculum Learning** strategy:
|
| 46 |
|
|
|
|
| 40 |
| **Precision** | FP8 Mixed-Precision |
|
| 41 |
|
| 42 |
---
|
| 43 |
+
## ⚙️ Deployment Configuration
|
| 44 |
|
| 45 |
+
To ensure optimal performance in production, the following environment variables and parameters are recommended for the **vLLM** backend:
|
| 46 |
+
|
| 47 |
+
### **Inference Engine (vLLM)**
|
| 48 |
+
* **Port:** `8002` (Default for Model B slot)
|
| 49 |
+
* **Max Model Length:** `65536` tokens
|
| 50 |
+
* **GPU Memory Utilization:** Recommended `0.90` - `0.95` for Blackwell/H200.
|
| 51 |
+
|
| 52 |
+
### **Model Parameters**
|
| 53 |
+
* **Max New Tokens:** `65536`
|
| 54 |
+
* **Temperature:** `0.7` (Balanced creativity and precision)
|
| 55 |
+
* **Top-P:** `0.9`
|
| 56 |
+
|
| 57 |
+
### **Server Settings**
|
| 58 |
+
* **Max Simultaneous Comparisons:** `1` (Recommended for 30B+ MoE on single node to maintain latency)
|
| 59 |
+
* **Chat Context Max Turns:** `10`
|
| 60 |
+
* **Max Prompt Characters:** `10000`
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
## 🧬 Training Curriculum
|
| 64 |
The model was trained using a three-phase **Curriculum Learning** strategy:
|
| 65 |
|