Update README.md
Browse files
README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
|
| 3 |
A small ~110M parameter language model implementing the **DeepSeek-V4 architecture**, fine-tuned for chat/instruction following. Trained from scratch — no weights from DeepSeek-V4 were used.
|
| 4 |
|
| 5 |
-
- **Pretrained base model**: [
|
| 6 |
- **This model**: SFT on [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk)
|
| 7 |
|
| 8 |
## Architecture
|
|
|
|
| 2 |
|
| 3 |
A small ~110M parameter language model implementing the **DeepSeek-V4 architecture**, fine-tuned for chat/instruction following. Trained from scratch — no weights from DeepSeek-V4 were used.
|
| 4 |
|
| 5 |
+
- **Pretrained base model**: [HuggingFaceTB/nanowhale-100m-base](https://huggingface.co/HuggingFaceTB/nanowhale-100m-base)
|
| 6 |
- **This model**: SFT on [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk)
|
| 7 |
|
| 8 |
## Architecture
|