Harley-ml
/

Tenete-8M

Text Generation

Eval Results (legacy)

Model card Files Files and versions

Harley-ml commited on 21 days ago

Commit

998e6ac

·

verified ·

1 Parent(s): 2e91c59

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -536,6 +536,8 @@ We chose to not include code, raw webdata (e.g., fineweb, c4, etc.), and more na
 | 0.86800 | 2.938 | 2.940 | 18.9 | 18.9 | 0.984 | 0.985 | 6.368 | 6.372 |
 | 0.94040 | **2.927** | **2.927** | **18.7** | **18.7** | **0.980** | **0.980** | **6.343** | **6.343** |
 Note: BPB stands for Bits Per Byte, and BPW stands for Bits Per Word.
 BPB is simply the amount of yes-no questions the model needs to predict the next byte accurately (1.0 BPB = 1 yes-no question), and BPW is the same thing but at the word level.

 | 0.86800 | 2.938 | 2.940 | 18.9 | 18.9 | 0.984 | 0.985 | 6.368 | 6.372 |
 | 0.94040 | **2.927** | **2.927** | **18.7** | **18.7** | **0.980** | **0.980** | **6.343** | **6.343** |
+![Loss and Perplexity Graph](images/training_graph.png
+)
 Note: BPB stands for Bits Per Byte, and BPW stands for Bits Per Word.
 BPB is simply the amount of yes-no questions the model needs to predict the next byte accurately (1.0 BPB = 1 yes-no question), and BPW is the same thing but at the word level.