clemsail commited on
Commit
7a1cb9f
·
verified ·
1 Parent(s): 8dbae86

docs: add upstream base model official evaluations

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -157,3 +157,27 @@ This model is referenced in the [Ailiance benchmark suite](https://github.com/ai
157
 
158
  See the full scoreboard:
159
  [ailiance-bench README#scoreboard-lora-phase-6](https://github.com/ailiance/ailiance-bench#scoreboard-lora-phase-6--2026-05-11).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
  See the full scoreboard:
159
  [ailiance-bench README#scoreboard-lora-phase-6](https://github.com/ailiance/ailiance-bench#scoreboard-lora-phase-6--2026-05-11).
160
+
161
+ ## Upstream base model — official evaluations
162
+
163
+ This LoRA fine-tunes [`mistralai/Devstral-Small-2-24B-Instruct-2512`](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512),
164
+ Mistral's coding-specialist LLM. Headline software-engineering benchmarks
165
+ from the upstream model card:
166
+
167
+ | Benchmark | Devstral Small 2 (24B) | Devstral 2 (123B) | DeepSeek v3.2 (671B) | Claude Sonnet 4.5 |
168
+ |--------------------------|-----------------------:|------------------:|---------------------:|------------------:|
169
+ | **SWE Bench Verified** | **68.0 %** | 72.2 % | 73.1 % | 77.2 % |
170
+ | **SWE Bench Multilingual** | **55.7 %** | 61.3 % | 70.2 % | 68.0 % |
171
+ | **Terminal Bench 2** | **22.5 %** | 32.6 % | 46.4 % | 42.8 % |
172
+
173
+ (For reference, GPT-5.1 Codex High: 73.7 % SWE Verified · 52.8 % Terminal Bench 2.)
174
+
175
+ Devstral Small 2 (24B) is competitive with much larger open models on
176
+ SWE Bench Verified (e.g. matches GLM-4.6 at 355B). Architecture uses
177
+ rope-scaling per Llama 4 + Scalable-Softmax ([arXiv:2501.19399](https://arxiv.org/abs/2501.19399)).
178
+
179
+ **Source:** [official Devstral-Small-2-24B-Instruct-2512 model card](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512).
180
+
181
+ > **Reading these alongside this LoRA:** Devstral Small 2 is a strong
182
+ > coding base. This LoRA inherits its SWE-Bench performance and adds
183
+ > language- or domain-specific specialization.