Harley-ml commited on
Commit
6511fd8
·
verified ·
1 Parent(s): 5b0c092

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -2
README.md CHANGED
@@ -19,7 +19,7 @@ new_version: SupraLabs/Supra-Mini-v4-2M
19
  ---
20
 
21
  # 🦅 Supra Mini v2 0.1M
22
- Supra Mini **v2** 0.1M is a very tiny base model trained on 700 million tokens of Fineweb-Edu for 3 epochs as the **second version** of our Supra Mini series.
23
 
24
  ## Model Config
25
 
@@ -35,7 +35,7 @@ Supra Mini **v2** 0.1M is a very tiny base model trained on 700 million tokens o
35
  - Weight Decay: 0.01
36
 
37
  ## Final Loss
38
- This model reached a final train loss after 3 epochs of **4.413**.
39
 
40
  ## Benchmarks
41
 
@@ -97,6 +97,18 @@ print("-" * 30)
97
  print("\nOutput:\n" + generate_text(test_prompt))
98
  ```
99
 
 
 
 
 
 
 
 
 
 
 
 
 
100
  ## Training guide
101
  We trained Supra Mini v2 0.1M on a single T4 GPU in ~2 hours for 3 epochs.<br>
102
  The full training code can be found in this repo as `run.sh` (easily run the complete pipeline), `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 2048), `train.py` (train the model) and `inference.py` (test the model).<br>
 
19
  ---
20
 
21
  # 🦅 Supra Mini v2 0.1M
22
+ Supra Mini **v2** 0.1M is a very, and we mean very small base model trained on 700 million tokens of Fineweb-Edu for 3 epochs as the **second version** of our Supra Mini series.
23
 
24
  ## Model Config
25
 
 
35
  - Weight Decay: 0.01
36
 
37
  ## Final Loss
38
+ This model reached a final train loss of **4.413**.
39
 
40
  ## Benchmarks
41
 
 
97
  print("\nOutput:\n" + generate_text(test_prompt))
98
  ```
99
 
100
+ ## Use cases
101
+
102
+ 1. Educational research
103
+ 2. deployment or testing/fine-tuning on edge environments
104
+ 3. Or more simply, for fun
105
+
106
+ ## Limitations
107
+
108
+ 1. Cannot reason, chat, or code
109
+ 2. Incoherent more often than not
110
+ 3. Mostly unfactual
111
+
112
  ## Training guide
113
  We trained Supra Mini v2 0.1M on a single T4 GPU in ~2 hours for 3 epochs.<br>
114
  The full training code can be found in this repo as `run.sh` (easily run the complete pipeline), `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 2048), `train.py` (train the model) and `inference.py` (test the model).<br>