jj-mvcpn commited on
Commit
ce84700
·
verified ·
1 Parent(s): 237d890

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -12
README.md CHANGED
@@ -49,6 +49,9 @@ tags:
49
 
50
  The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.
51
 
 
 
 
52
  ---
53
 
54
  ## Key Characteristics
@@ -230,22 +233,23 @@ Scores are accuracy or benchmark-specific metrics. Use `—` or *TBD* for evalua
230
 
231
  ### Quantitative Results (Inference Performance)
232
 
233
- Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-20b** and **gpt-oss-120b** on the same hardware.
234
 
235
  #### Performance evaluation conditions
236
 
237
- Describe the setup used to obtain the numbers in the table below (replace the placeholders or add a short paragraph):
238
-
239
  - **Inference library**: vLLM 0.14.0
240
- - **Hardware**: 4× NVIDIA H200 Tensor Core GPU
241
- - **Conditions**: batch size=512, context length=512, decode length=256
242
- - **Notes**: dtype=default
243
-
244
- | Metric | gpt-oss-20b | gpt-oss-120b | HyperNova 60B 2602 | Hardware |
245
- |----------------------------|--------------------------|--------------------------|--------------------------|-------------------------------|
246
- | Tokens / second (decode) | 250 | 228 | 240 | 4× NVIDIA H200 Tensor Core GPU|
247
- | Time to first token (ms) | 26 | 26 | 25 | 4× NVIDIA H200 Tensor Core GPU|
248
- | Peak GPU memory (GB) | 13 | 61 | 32 | 4× NVIDIA H200 Tensor Core GPU|
 
 
 
249
 
250
  ![Performance](assets/performance.png)
251
 
 
49
 
50
  The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.
51
 
52
+ ## Technical Deep Dive
53
+ For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B v2602, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability)
54
+
55
  ---
56
 
57
  ## Key Characteristics
 
233
 
234
  ### Quantitative Results (Inference Performance)
235
 
236
+ Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-120b** on the same hardware.
237
 
238
  #### Performance evaluation conditions
239
 
 
 
240
  - **Inference library**: vLLM 0.14.0
241
+ - **Hardware**: 1× NVIDIA H200 Tensor Core GPU
242
+ - **Conditions**: concurrency=128
243
+
244
+ **Summary of Improvements:**
245
+
246
+ - **Throughput (tok/s)**: Hypernova is 39.5% faster
247
+ - **Mean TTFT (ms)**: Hypernova is 39.4% faster
248
+ - **Median TTFT (ms)**: Hypernova is 50.8% faster
249
+ - **P99 TTFT (ms)**: Hypernova is 36.0% faster
250
+ - **Mean TPOT (ms)**: Hypernova is 45.5% faster
251
+ - **Mean ITL (ms)**: Hypernova is 45.4% faster
252
+
253
 
254
  ![Performance](assets/performance.png)
255