Harley-ml commited on
Commit
3a6766b
·
verified ·
1 Parent(s): dd262cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - Harley-ml/es-en-words
5
+ language:
6
+ - en
7
+ tags:
8
+ - small
9
+ - small-language-model
10
+ - largeword
11
+ - word-generation
12
+ - harley-ml
13
+ - word
14
+ - words
15
+ - wordgen
16
+ - qwen3
17
+ ---
18
+
19
+ # LargeWord
20
+
21
+ LargeWord is the largest model in the [WordGen] family and has about 1.59M parameters.
22
+ LargeWord has an instruct version [here].
23
+
24
+ LargeWord generates pluasible or real words learned from its pretraining dataset.
25
+
26
+ ## Architecture
27
+
28
+ | Parameter | Value |
29
+ |-------------------------|-------|
30
+ | hidden_size | 160 |
31
+ | num_hidden_layers | 4 |
32
+ | num_attention_heads | 2 |
33
+ | num_key_value_heads | 2 |
34
+ | intermediate_size | 512 |
35
+ | max_position_embeddings | 77 |
36
+ | rope_theta | 10000.0 |
37
+ | tie_word_embeddings | True |
38
+ | vocab_size | 1204 |
39
+
40
+ ## Training
41
+
42
+ LargeWord trained on 753,232 words and 4,153,110 tokens. Its goal is to generate plausible-looking or real words.
43
+
44
+ ### Hardware
45
+
46
+ LargeWord was trained on a NVIDIA RTX 2060 6GB for 2 epochs with a batch size of 8.
47
+
48
+ ### Training Results
49
+
50
+ | Step | Epoch | Train Loss | Train PPL | Eval Loss | Eval PPL |
51
+ |------|-------|------------|-----------|-----------|----------|
52
+ | 500 | 0.30 | 4.3276 | 75.74 | 2.4190 | 11.23 |
53
+ | 1000 | 0.61 | 1.7151 | 5.56 | 1.4076 | 4.09 |
54
+ | 1500 | 0.91 | 1.3247 | 3.76 | 1.2682 | 3.55 |
55
+ | 2000 | 1.21 | 1.2120 | 3.36 | 1.2026 | 3.33 |
56
+ | 2500 | 1.51 | 1.1619 | 3.20 | 1.1667 | 3.21 |
57
+ | 3000 | 1.82 | 1.1314 | 3.10 | 1.1378 | 3.12 |