JaydeepR commited on
Commit
fa8c63d
·
verified ·
1 Parent(s): 7b8978b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -26,10 +26,16 @@ This is the **base pretrained checkpoint** before SFT instruction tuning. For in
26
  | Base model | ModernBERT-base |
27
  | Parameters | ~150M |
28
  | Architecture | Masked Language Model (diffusion objective) |
29
- | Pretrain data | Project Gutenberg (~6.4M chunks, seq_len=1024) |
30
  | Pretrain steps | 30,000 |
31
- | Final train loss | 2.92 |
32
- | Final val loss | 2.96 |
 
 
 
 
 
 
33
 
34
  ---
35
 
 
26
  | Base model | ModernBERT-base |
27
  | Parameters | ~150M |
28
  | Architecture | Masked Language Model (diffusion objective) |
29
+ | Pretrain data | Project Gutenberg (6,400,553 train chunks, seq_len=1024) |
30
  | Pretrain steps | 30,000 |
31
+ | Effective batch size | 128 |
32
+ | Learning rate | 5e-5 (cosine, 1500 warmup steps) |
33
+ | Hardware | RTX 4090 24GB |
34
+ | Training time | ~20 hours |
35
+ | Initial train loss | 3.887 |
36
+ | Initial val loss | 3.922 |
37
+ | Final train loss | 2.917 |
38
+ | Final val loss | 2.962 |
39
 
40
  ---
41