Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,61 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
|
|
|
| 3 |
- arcee-ai/Arcee-Blitz
|
| 4 |
-
|
| 5 |
-
|
|
|
|
| 6 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
+
- mistralai/Mistral-Small-24B-Base-2501
|
| 5 |
- arcee-ai/Arcee-Blitz
|
| 6 |
+
datasets:
|
| 7 |
+
- open-thoughts/OpenThoughts-114k
|
| 8 |
+
- Undi95/R1-RP-ShareGPT3
|
| 9 |
---
|
| 10 |
+
<p align="left">
|
| 11 |
+
<img width="45%" src="V3.png">
|
| 12 |
+
</p>
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
# Introduction
|
| 16 |
+
This model is my feeble attempt at reproducing the R1 at home experience, following the (un)official Deepseek formula: \
|
| 17 |
+
`Deepseek V3 + Unhinged Reasoning = R1` \
|
| 18 |
+
Substituting out some variables we get: \
|
| 19 |
+
`Arcee Blitz V3 Distill + Unhinged R1 Reasoning Traces = MistralSmallV3R` \
|
| 20 |
+
\
|
| 21 |
+
I'm quite happy with how this model turned out; It's definitely weaker than QwQ for coding and long reasoning problems but a lot more personable and well rounded overall.
|
| 22 |
+
The faster speed and VRAM savings are quite nice as well for how long these models can reason and how much context they can burn through.
|
| 23 |
+
Stability is satisfactory too and the model appears to function well even without super aggressive sampling like the base instruct finetune.
|
| 24 |
+
|
| 25 |
+
## Use Cases
|
| 26 |
+
Just like the base model, MistralSmallV3R is a solid `all rounder` and should be able to effectively generalize its reasoning capabilities,
|
| 27 |
+
as care was taken to train this model on
|
| 28 |
+
a wide variety of reasoning tasks, including math, coding, roleplay, etc.
|
| 29 |
+
In particular, MistralSmallV3R appears very good at `contextual and emotional reasoning`
|
| 30 |
+
compared to other reasoning models that I have tested so far. MistralSmallV3R also tends to spend good portion of its thought considering what the user wants,
|
| 31 |
+
hopefully giving this model higher `resistance to poor prompting`.
|
| 32 |
+
Compared to other mid-range reasoning models such as QwQ it
|
| 33 |
+
also has markedly `better prose quality` without sounding like too much of a try hard like some of the Gutenberg models.
|
| 34 |
+
This model has also inherited a notable negative bias from the R1 synthetic data it was trained on. \
|
| 35 |
+
Supports up to 32k context. \
|
| 36 |
+
Should remain usable with only `12GB of VRAM` when quanted to IQ3_M or IQ3_S.
|
| 37 |
+
|
| 38 |
+
## Recommended Settings:
|
| 39 |
+
- Template: Mistral Tekken V7
|
| 40 |
+
- Temp: 0.1-0.7, depending on use case.
|
| 41 |
+
- TopP: 0.95
|
| 42 |
+
- MinP: 0.05
|
| 43 |
+
- Rep Pen: Not needed, but if you do use it keep the range low!
|
| 44 |
+
- Do not use DRY!
|
| 45 |
+
- System prompt: Trained on Sillytavern defaults. Add `Reason out how you will respond between the <think> and </think> tags.` to the end of your system prompt. Optionally you may wish to prompt the model to have positive bias.
|
| 46 |
+
- Add the `<think>` tag to the beginning of the response.
|
| 47 |
+
- Enable reasoning auto-parse, and remove the newlines from the reasoning formatting prefix and suffix.
|
| 48 |
+
- Low temps allow for longer reasoning
|
| 49 |
+
- Q4_K_M is reccomended as the smallest effectively lossless quant.
|
| 50 |
+
|
| 51 |
+
## Benchmarks
|
| 52 |
+
TODO
|
| 53 |
+
|
| 54 |
+
## Dataset
|
| 55 |
+
Was trained on an interleaved dataset of 40% LimaRP-R1 and 60% Openthoughts.
|
| 56 |
+
|
| 57 |
+
## Thanks
|
| 58 |
+
Major thanks to Arcee AI for their excellent Arcee Blitz finetune and general mergekit nonsense. \
|
| 59 |
+
Undi95 for proving the viability of a stable reasoning finetune of MS3. \
|
| 60 |
+
OpenThoughts for their dataset of the same name. \
|
| 61 |
+
Mistral, for their continuing dedication to the open source community.
|