Update README.md
Browse files
README.md
CHANGED
|
@@ -571,6 +571,7 @@ We decided to evaluate the model on each source it trained on to see the differe
|
|
| 571 |
|
| 572 |
The model achieves random or near-random on most tasks, which is expected. An 8M parameter model cannot store world-level knowledge or thoroughly reason.
|
| 573 |
|
|
|
|
| 574 |
### Coherency Benchmark
|
| 575 |
|
| 576 |
To evaluate the **coherency, factuality, and fluency** of our (and other) models, we use **Qwen3-32B** to grade **300 different generations** generated from an **unconditional prompt**.
|
|
|
|
| 571 |
|
| 572 |
The model achieves random or near-random on most tasks, which is expected. An 8M parameter model cannot store world-level knowledge or thoroughly reason.
|
| 573 |
|
| 574 |
+
Note: The full breakdown (LM Harness Output) is right [here](https://huggingface.co/Harley-ml/Tenete-8M/blob/main/raw_lmharness_eval_output.txt)
|
| 575 |
### Coherency Benchmark
|
| 576 |
|
| 577 |
To evaluate the **coherency, factuality, and fluency** of our (and other) models, we use **Qwen3-32B** to grade **300 different generations** generated from an **unconditional prompt**.
|