Harley-ml commited on
Commit
d76ef63
·
verified ·
1 Parent(s): 0465ecc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -14,4 +14,10 @@ tags:
14
  - config-generation
15
  - json-generation
16
  - harley-ml
17
- ---
 
 
 
 
 
 
 
14
  - config-generation
15
  - json-generation
16
  - harley-ml
17
+ ---
18
+
19
+ # MCODLarge
20
+
21
+ MCOD, which stands for "Model Configs on Drugs," large is a 4.7M parameter model trained on 7.1M tokens of Hugging Face model configs.
22
+ We are well aware that 7.1M tokens is under the Chinchilla optimal target, but including more tokens wouldn't help diversity. For example, after cleaning the full 90M token dataset, we were left with 7.1M tokens after deduping (over 13k docs) and filtering (by lang and length).
23
+ Anyway, MCODLarge