Harley-ml commited on
Commit
998e6ac
·
verified ·
1 Parent(s): 2e91c59

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -536,6 +536,8 @@ We chose to not include code, raw webdata (e.g., fineweb, c4, etc.), and more na
536
  | 0.86800 | 2.938 | 2.940 | 18.9 | 18.9 | 0.984 | 0.985 | 6.368 | 6.372 |
537
  | 0.94040 | **2.927** | **2.927** | **18.7** | **18.7** | **0.980** | **0.980** | **6.343** | **6.343** |
538
 
 
 
539
  Note: BPB stands for Bits Per Byte, and BPW stands for Bits Per Word.
540
  BPB is simply the amount of yes-no questions the model needs to predict the next byte accurately (1.0 BPB = 1 yes-no question), and BPW is the same thing but at the word level.
541
 
 
536
  | 0.86800 | 2.938 | 2.940 | 18.9 | 18.9 | 0.984 | 0.985 | 6.368 | 6.372 |
537
  | 0.94040 | **2.927** | **2.927** | **18.7** | **18.7** | **0.980** | **0.980** | **6.343** | **6.343** |
538
 
539
+ ![Loss and Perplexity Graph](images/training_graph.png
540
+ )
541
  Note: BPB stands for Bits Per Byte, and BPW stands for Bits Per Word.
542
  BPB is simply the amount of yes-no questions the model needs to predict the next byte accurately (1.0 BPB = 1 yes-no question), and BPW is the same thing but at the word level.
543