Update README.md
Browse files
README.md
CHANGED
|
@@ -536,6 +536,8 @@ We chose to not include code, raw webdata (e.g., fineweb, c4, etc.), and more na
|
|
| 536 |
| 0.86800 | 2.938 | 2.940 | 18.9 | 18.9 | 0.984 | 0.985 | 6.368 | 6.372 |
|
| 537 |
| 0.94040 | **2.927** | **2.927** | **18.7** | **18.7** | **0.980** | **0.980** | **6.343** | **6.343** |
|
| 538 |
|
|
|
|
|
|
|
| 539 |
Note: BPB stands for Bits Per Byte, and BPW stands for Bits Per Word.
|
| 540 |
BPB is simply the amount of yes-no questions the model needs to predict the next byte accurately (1.0 BPB = 1 yes-no question), and BPW is the same thing but at the word level.
|
| 541 |
|
|
|
|
| 536 |
| 0.86800 | 2.938 | 2.940 | 18.9 | 18.9 | 0.984 | 0.985 | 6.368 | 6.372 |
|
| 537 |
| 0.94040 | **2.927** | **2.927** | **18.7** | **18.7** | **0.980** | **0.980** | **6.343** | **6.343** |
|
| 538 |
|
| 539 |
+

|
| 541 |
Note: BPB stands for Bits Per Byte, and BPW stands for Bits Per Word.
|
| 542 |
BPB is simply the amount of yes-no questions the model needs to predict the next byte accurately (1.0 BPB = 1 yes-no question), and BPW is the same thing but at the word level.
|
| 543 |
|