Update README.md
Browse files
README.md
CHANGED
|
@@ -49,6 +49,9 @@ tags:
|
|
| 49 |
|
| 50 |
The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.
|
| 51 |
|
|
|
|
|
|
|
|
|
|
| 52 |
---
|
| 53 |
|
| 54 |
## Key Characteristics
|
|
@@ -230,22 +233,23 @@ Scores are accuracy or benchmark-specific metrics. Use `—` or *TBD* for evalua
|
|
| 230 |
|
| 231 |
### Quantitative Results (Inference Performance)
|
| 232 |
|
| 233 |
-
Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-
|
| 234 |
|
| 235 |
#### Performance evaluation conditions
|
| 236 |
|
| 237 |
-
Describe the setup used to obtain the numbers in the table below (replace the placeholders or add a short paragraph):
|
| 238 |
-
|
| 239 |
- **Inference library**: vLLM 0.14.0
|
| 240 |
-
- **Hardware**:
|
| 241 |
-
- **Conditions**:
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
|
|
|
|
|
|
|
|
|
| 249 |
|
| 250 |

|
| 251 |
|
|
|
|
| 49 |
|
| 50 |
The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.
|
| 51 |
|
| 52 |
+
## Technical Deep Dive
|
| 53 |
+
For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B v2602, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability)
|
| 54 |
+
|
| 55 |
---
|
| 56 |
|
| 57 |
## Key Characteristics
|
|
|
|
| 233 |
|
| 234 |
### Quantitative Results (Inference Performance)
|
| 235 |
|
| 236 |
+
Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-120b** on the same hardware.
|
| 237 |
|
| 238 |
#### Performance evaluation conditions
|
| 239 |
|
|
|
|
|
|
|
| 240 |
- **Inference library**: vLLM 0.14.0
|
| 241 |
+
- **Hardware**: 1× NVIDIA H200 Tensor Core GPU
|
| 242 |
+
- **Conditions**: concurrency=128
|
| 243 |
+
|
| 244 |
+
**Summary of Improvements:**
|
| 245 |
+
|
| 246 |
+
- **Throughput (tok/s)**: Hypernova is 39.5% faster
|
| 247 |
+
- **Mean TTFT (ms)**: Hypernova is 39.4% faster
|
| 248 |
+
- **Median TTFT (ms)**: Hypernova is 50.8% faster
|
| 249 |
+
- **P99 TTFT (ms)**: Hypernova is 36.0% faster
|
| 250 |
+
- **Mean TPOT (ms)**: Hypernova is 45.5% faster
|
| 251 |
+
- **Mean ITL (ms)**: Hypernova is 45.4% faster
|
| 252 |
+
|
| 253 |
|
| 254 |

|
| 255 |
|