Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model:
|
| 4 |
+
- rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled
|
| 5 |
+
- OrionLLM/GRM-2.6-Plus
|
| 6 |
+
base_model_relation: merge
|
| 7 |
+
pipeline_tag: image-text-to-text
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
<p align="center">
|
| 11 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/685ea8ff7b4139b6845ce395/_66bkNH630dGeIt2Uuctd.png" alt="logo" width="500">
|
| 12 |
+
</p>
|
| 13 |
+
|
| 14 |
+
<div align="center">
|
| 15 |
+
<a href="https://huggingface.co/OrionLLM/GRM-2.6-Opus/" style="text-decoration: none;">
|
| 16 |
+
<img src="https://img.shields.io/badge/🤗-HuggingFace-FC926C?style=for-the-badge" alt="HuggingFace">
|
| 17 |
+
</a>
|
| 18 |
+
<a href="https://huggingface.co/collections/OrionLLM/grm-26" style="text-decoration: none;">
|
| 19 |
+
<img src="https://img.shields.io/badge/📚-Collection-3B82F6?style=for-the-badge" alt="Collection">
|
| 20 |
+
</a>
|
| 21 |
+
<a href="https://www.apache.org/licenses/LICENSE-2.0" style="text-decoration: none;">
|
| 22 |
+
<img src="https://img.shields.io/badge/📜-License-E343BD?style=for-the-badge" alt="License">
|
| 23 |
+
</a>
|
| 24 |
+
</div>
|
| 25 |
+
|
| 26 |
+
## 1. Introduction
|
| 27 |
+
|
| 28 |
+
**GRM-2.6-Opus** is a merge between **OrionLLM/GRM-2.6-Plus** and **rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled**.
|
| 29 |
+
|
| 30 |
+
GRM-2.6-Opus is a **general-purpose AI model** optimized for **difficult, high-complexity tasks**. It is designed to deliver stronger performance for its size while remaining practical, efficient, and accessible for advanced local and research-oriented use.
|
| 31 |
+
|
| 32 |
+
The model now follows an **Opus-style reasoning format**, producing more structured, organized, and deliberate reasoning. This merge improves its ability to handle **terminal agents**, **coding workflows**, and complex problem-solving tasks, taking advantage of the strong reasoning and agentic capabilities associated with Claude Opus-style distilled behavior.
|
| 33 |
+
|
| 34 |
+
GRM-2.6-Opus demonstrates improvements over the original **GRM-2.6-Plus**, especially in structured reasoning, coding, agent workflows, and high-difficulty STEM evaluation.
|
| 35 |
+
|
| 36 |
+
## 2. Key Capabilities
|
| 37 |
+
|
| 38 |
+
- **Opus-Style Structured Reasoning:** GRM-2.6-Opus uses a more organized reasoning format, helping it produce clearer and more reliable solutions for complex tasks.
|
| 39 |
+
- **Improved Terminal Agent Ability:** The model is better suited for terminal-based agents, tool-style workflows, debugging, code execution planning, and multi-step technical tasks.
|
| 40 |
+
- **Stronger Coding Performance:** The merge improves code reasoning, implementation planning, and difficult programming task handling.
|
| 41 |
+
- **Enhanced General-Purpose Intelligence:** GRM-2.6-Opus remains useful across research, STEM, chat, coding, local agents, and advanced problem-solving.
|
| 42 |
+
- **Improved Over GRM-2.6-Plus:** The model builds on the original GRM-2.6-Plus and adds stronger structured reasoning behavior through the Opus-style distilled merge.
|
| 43 |
+
|
| 44 |
+
## 3. Performance
|
| 45 |
+
|
| 46 |
+
GRM-2.6-Opus is designed to be a highly capable **27B local AI model** for complex reasoning, coding, everyday chat, and agentic workflows. It focuses on delivering **better performance for its size**, making it a strong option for users who want powerful reasoning without relying only on massive-scale models.
|
| 47 |
+
|
| 48 |
+
Its core strength is **practical intelligence**: structured reasoning, strong task understanding, improved coding behavior, stable responses, and the ability to handle difficult problems across multiple domains.
|
| 49 |
+
|
| 50 |
+
### Detailed Benchmarks
|
| 51 |
+
|
| 52 |
+
<table>
|
| 53 |
+
<tr>
|
| 54 |
+
<th style="background: rgba(128,128,128,0.1); text-align: center;">Benchmark</th>
|
| 55 |
+
<th style="background: rgba(128,128,128,0.1); text-align: center;">GRM-2.6-Opus</th>
|
| 56 |
+
<th style="background: rgba(128,128,128,0.1); text-align: center;">GRM-2.6-Plus</th>
|
| 57 |
+
<th style="background: rgba(128,128,128,0.1); text-align: center;">Qwen3.6-27B</th>
|
| 58 |
+
<th style="background: rgba(128,128,128,0.1); text-align: center;">google/gemma-4-31B-it</th>
|
| 59 |
+
<th style="background: rgba(128,128,128,0.1); text-align: center;">GPT-5.4-Mini</th>
|
| 60 |
+
<th style="background: rgba(128,128,128,0.1); text-align: center;">Claude-4.5-Haiku</th>
|
| 61 |
+
</tr>
|
| 62 |
+
<tr>
|
| 63 |
+
<td align="center" colspan="7" style="background: linear-gradient(90deg, rgba(124,58,237,0.45) 0%, rgba(99,102,241,0.42) 50%, rgba(59,130,246,0.45) 100%); font-weight: bold; height:32px; padding-top:2px; padding-bottom:2px;"><i>Knowledge & STEM</i></td>
|
| 64 |
+
</tr>
|
| 65 |
+
<tr>
|
| 66 |
+
<td align="center">GPQA Diamond</td>
|
| 67 |
+
<td align="center"><b>89.2</b></td>
|
| 68 |
+
<td align="center">88.3</td>
|
| 69 |
+
<td align="center">87.8</td>
|
| 70 |
+
<td align="center">84.3</td>
|
| 71 |
+
<td align="center">88.0</td>
|
| 72 |
+
<td align="center">73.0</td>
|
| 73 |
+
</tr>
|
| 74 |
+
</table>
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
<div align="center">
|
| 79 |
+
|
| 80 |
+
**GRM-2.6-Opus** is developed by **[OrionLLM](https://huggingface.co/OrionLLM)** and released under the Apache 2.0 License.
|
| 81 |
+
|
| 82 |
+
</div>
|