Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -26,10 +26,16 @@ This is the **base pretrained checkpoint** before SFT instruction tuning. For in
|
|
| 26 |
| Base model | ModernBERT-base |
|
| 27 |
| Parameters | ~150M |
|
| 28 |
| Architecture | Masked Language Model (diffusion objective) |
|
| 29 |
-
| Pretrain data | Project Gutenberg (
|
| 30 |
| Pretrain steps | 30,000 |
|
| 31 |
-
|
|
| 32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
---
|
| 35 |
|
|
|
|
| 26 |
| Base model | ModernBERT-base |
|
| 27 |
| Parameters | ~150M |
|
| 28 |
| Architecture | Masked Language Model (diffusion objective) |
|
| 29 |
+
| Pretrain data | Project Gutenberg (6,400,553 train chunks, seq_len=1024) |
|
| 30 |
| Pretrain steps | 30,000 |
|
| 31 |
+
| Effective batch size | 128 |
|
| 32 |
+
| Learning rate | 5e-5 (cosine, 1500 warmup steps) |
|
| 33 |
+
| Hardware | RTX 4090 24GB |
|
| 34 |
+
| Training time | ~20 hours |
|
| 35 |
+
| Initial train loss | 3.887 |
|
| 36 |
+
| Initial val loss | 3.922 |
|
| 37 |
+
| Final train loss | 2.917 |
|
| 38 |
+
| Final val loss | 2.962 |
|
| 39 |
|
| 40 |
---
|
| 41 |
|