Update README.md
Browse files
README.md
CHANGED
|
@@ -493,7 +493,7 @@ Tenete-8M uses the **Qwen3 architecture**.
|
|
| 493 |
|
| 494 |
## Training
|
| 495 |
|
| 496 |
-
Tenete-8M was trained on an **RTX 2060 6GB** for one epoch with a batch size of 4 and a gradient accumulation of 18
|
| 497 |
|
| 498 |
### Dataset
|
| 499 |
|
|
|
|
| 493 |
|
| 494 |
## Training
|
| 495 |
|
| 496 |
+
Tenete-8M was trained on an **RTX 2060 6GB** for one epoch with a batch size of 4 and a gradient accumulation of 18 (**effective batch size=72**) for two hours and twenty minutes.
|
| 497 |
|
| 498 |
### Dataset
|
| 499 |
|