MultiverseComputingCAI
/

Hypernova-60B-2602-GGUF

@@ -49,6 +49,9 @@ tags:
 The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.
 ---
 ## Key Characteristics
@@ -230,22 +233,23 @@ Scores are accuracy or benchmark-specific metrics. Use `—` or *TBD* for evalua
 ### Quantitative Results (Inference Performance)
-Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-20b** and **gpt-oss-120b** on the same hardware.
 #### Performance evaluation conditions
-Describe the setup used to obtain the numbers in the table below (replace the placeholders or add a short paragraph):
 - **Inference library**: vLLM 0.14.0
-- **Hardware**: 4× NVIDIA H200 Tensor Core GPU
-- **Conditions**: batch size=512, context length=512, decode length=256
-- **Notes**: dtype=default
-| Metric                     | gpt-oss-20b              | gpt-oss-120b             | HyperNova 60B 2602       | Hardware                      |
-|----------------------------|--------------------------|--------------------------|--------------------------|-------------------------------|
-| Tokens / second (decode)   | 250                      | 228                      | 240                      | 4× NVIDIA H200 Tensor Core GPU|
-| Time to first token (ms)   | 26                       | 26                       | 25                       | 4× NVIDIA H200 Tensor Core GPU|
-| Peak GPU memory (GB)       | 13                       | 61                       | 32                       | 4× NVIDIA H200 Tensor Core GPU|
 ![Performance](assets/performance.png)

 The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility.
+## Technical Deep Dive
+For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B v2602, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability)
 ---
 ## Key Characteristics
 ### Quantitative Results (Inference Performance)
+Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-120b** on the same hardware.
 #### Performance evaluation conditions
 - **Inference library**: vLLM 0.14.0
+- **Hardware**: 1× NVIDIA H200 Tensor Core GPU
+- **Conditions**: concurrency=128
+**Summary of Improvements:**
+- **Throughput (tok/s)**: Hypernova is 39.5% faster
+- **Mean TTFT (ms)**: Hypernova is 39.4% faster
+- **Median TTFT (ms)**: Hypernova is 50.8% faster
+- **P99 TTFT (ms)**: Hypernova is 36.0% faster
+- **Mean TPOT (ms)**: Hypernova is 45.5% faster
+- **Mean ITL (ms)**: Hypernova is 45.4% faster
 ![Performance](assets/performance.png)