Update README.md
Browse files
README.md
CHANGED
|
@@ -497,7 +497,7 @@ Tenete-8M was trained on an **RTX 2060 6GB** for one epoch with a batch size of
|
|
| 497 |
|
| 498 |
### Dataset
|
| 499 |
|
| 500 |
-
The dataset encompasses **577M tokens**, and includes **
|
| 501 |
|
| 502 |
1. **Textbooks**: Web data is too noisy, so we decided to use Tiny-Textbooks, a synthetic dataset generated by
|
| 503 |
2. **Medium Articles**: While web data, especially medium articles, is noisy, we still need human-written examples
|
|
|
|
| 497 |
|
| 498 |
### Dataset
|
| 499 |
|
| 500 |
+
The dataset encompasses **577M tokens**, and includes **4 sources**:
|
| 501 |
|
| 502 |
1. **Textbooks**: Web data is too noisy, so we decided to use Tiny-Textbooks, a synthetic dataset generated by
|
| 503 |
2. **Medium Articles**: While web data, especially medium articles, is noisy, we still need human-written examples
|