Update README.md
Browse files
README.md
CHANGED
|
@@ -14,4 +14,10 @@ tags:
|
|
| 14 |
- config-generation
|
| 15 |
- json-generation
|
| 16 |
- harley-ml
|
| 17 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
- config-generation
|
| 15 |
- json-generation
|
| 16 |
- harley-ml
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# MCODLarge
|
| 20 |
+
|
| 21 |
+
MCOD, which stands for "Model Configs on Drugs," large is a 4.7M parameter model trained on 7.1M tokens of Hugging Face model configs.
|
| 22 |
+
We are well aware that 7.1M tokens is under the Chinchilla optimal target, but including more tokens wouldn't help diversity. For example, after cleaning the full 90M token dataset, we were left with 7.1M tokens after deduping (over 13k docs) and filtering (by lang and length).
|
| 23 |
+
Anyway, MCODLarge
|