Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -40,7 +40,26 @@ HEBATRON is designed to handle the structural and morphological complexities of
 | **Precision** | FP8 Mixed-Precision |
 ---
 ## 🧬 Training Curriculum
 The model was trained using a three-phase **Curriculum Learning** strategy:

 | **Precision** | FP8 Mixed-Precision |
 ---
+## ⚙️ Deployment Configuration
+To ensure optimal performance in production, the following environment variables and parameters are recommended for the **vLLM** backend:
+### **Inference Engine (vLLM)**
+* **Port:** `8002` (Default for Model B slot)
+* **Max Model Length:** `65536` tokens
+* **GPU Memory Utilization:** Recommended `0.90` - `0.95` for Blackwell/H200.
+### **Model Parameters**
+* **Max New Tokens:** `65536`
+* **Temperature:** `0.7` (Balanced creativity and precision)
+* **Top-P:** `0.9`
+### **Server Settings**
+* **Max Simultaneous Comparisons:** `1` (Recommended for 30B+ MoE on single node to maintain latency)
+* **Chat Context Max Turns:** `10`
+* **Max Prompt Characters:** `10000`
+---
 ## 🧬 Training Curriculum
 The model was trained using a three-phase **Curriculum Learning** strategy: