sarel commited on
Commit
f1c51e4
·
verified ·
1 Parent(s): 1be3465

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -40,7 +40,26 @@ HEBATRON is designed to handle the structural and morphological complexities of
40
  | **Precision** | FP8 Mixed-Precision |
41
 
42
  ---
 
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ## 🧬 Training Curriculum
45
  The model was trained using a three-phase **Curriculum Learning** strategy:
46
 
 
40
  | **Precision** | FP8 Mixed-Precision |
41
 
42
  ---
43
+ ## ⚙️ Deployment Configuration
44
 
45
+ To ensure optimal performance in production, the following environment variables and parameters are recommended for the **vLLM** backend:
46
+
47
+ ### **Inference Engine (vLLM)**
48
+ * **Port:** `8002` (Default for Model B slot)
49
+ * **Max Model Length:** `65536` tokens
50
+ * **GPU Memory Utilization:** Recommended `0.90` - `0.95` for Blackwell/H200.
51
+
52
+ ### **Model Parameters**
53
+ * **Max New Tokens:** `65536`
54
+ * **Temperature:** `0.7` (Balanced creativity and precision)
55
+ * **Top-P:** `0.9`
56
+
57
+ ### **Server Settings**
58
+ * **Max Simultaneous Comparisons:** `1` (Recommended for 30B+ MoE on single node to maintain latency)
59
+ * **Chat Context Max Turns:** `10`
60
+ * **Max Prompt Characters:** `10000`
61
+
62
+ ---
63
  ## 🧬 Training Curriculum
64
  The model was trained using a three-phase **Curriculum Learning** strategy:
65