Harley-ml commited on
Commit
5dba39a
·
verified ·
1 Parent(s): a276117

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -5
README.md CHANGED
@@ -18,7 +18,7 @@ tags:
18
  ---
19
 
20
  # 🦅 Supra Mini v5 8M
21
- Supra Mini **v5** 8M is a very tiny base model trained on **5 billion** tokens of Fineweb-Edu for 2 epochs as the **fifth version** of our Supra Mini series.
22
 
23
  ## Model Config
24
 
@@ -35,7 +35,7 @@ Supra Mini **v5** 8M is a very tiny base model trained on **5 billion** tokens o
35
  - Trained in bfloat16
36
 
37
  ## Final Loss
38
- This model reached a final train loss after 2 epochs of **4.414**.
39
 
40
  ## Benchmarks
41
 
@@ -69,12 +69,13 @@ So, why does that sound? Because there's something new than a few different thin
69
  \- "You have no idea where I'm so, but if my son has any thought or understanding of his name, he will be able to understand him by saying something more than one day. This way they can tell us when you've got me at home. That is why I want your child to know which words are most important for them: "If you get this language from another person, then you'll find yourself in the same place as you read it," says Mike McNamara, who was born with an English friend, Jennifer Batharinee, who had been diagnosed with dementia during her lifetime. He said he would learn how to say things such as "a lot of things," and "you don't really need to do anything else." It may seem simple because he didn't feel good before he went out and asked whether he could make sense of it. But he wanted to take advantage of the fact"*
70
 
71
  ## Usage
72
- To use our model, just run this code using HF Transformers to execute the model:
 
73
  ```python3
74
  from transformers import pipeline
75
  import torch
76
 
77
- print("[*] Loading Supra Mini v5 8M model from Hugging Face Hub...")
78
  pipe = pipeline(
79
  "text-generation",
80
  model="SupraLabs/Supra-Mini-v5-8M",
@@ -101,8 +102,19 @@ print(f"\nPrompt: {test_prompt}")
101
  print("-" * 30)
102
  print("\nOutput:\n" + generate_text(test_prompt))
103
  ```
 
 
 
 
 
 
 
 
 
 
 
104
 
105
  ## Training guide
106
  We trained Supra Mini v5 8M on a single NVIDIA RTX 5060 Ti 16GB in ~11 hours for 2 epochs.<br>
107
  The full training code can be found in this repo as `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 16384), `train.py` (train the model) and `inference.py` (test the model).<br>
108
- The model was trained on the first 5 billion tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.
 
18
  ---
19
 
20
  # 🦅 Supra Mini v5 8M
21
+ Supra Mini **v5** 8M is a very small model trained on **5 billion** tokens of Fineweb-Edu for 2 epochs as the **fifth version** of our Supra Mini series. SupraMini-8M shows improvements across all benchmarks because of its larger size and training budget.
22
 
23
  ## Model Config
24
 
 
35
  - Trained in bfloat16
36
 
37
  ## Final Loss
38
+ This model reached a final CrossEntropy loss (on the train set) of **4.414**.
39
 
40
  ## Benchmarks
41
 
 
69
  \- "You have no idea where I'm so, but if my son has any thought or understanding of his name, he will be able to understand him by saying something more than one day. This way they can tell us when you've got me at home. That is why I want your child to know which words are most important for them: "If you get this language from another person, then you'll find yourself in the same place as you read it," says Mike McNamara, who was born with an English friend, Jennifer Batharinee, who had been diagnosed with dementia during her lifetime. He said he would learn how to say things such as "a lot of things," and "you don't really need to do anything else." It may seem simple because he didn't feel good before he went out and asked whether he could make sense of it. But he wanted to take advantage of the fact"*
70
 
71
  ## Usage
72
+ To use our model, just run this code:
73
+
74
  ```python3
75
  from transformers import pipeline
76
  import torch
77
 
78
+ print("Loading Supra Mini v5 8M model from Hugging Face...")
79
  pipe = pipeline(
80
  "text-generation",
81
  model="SupraLabs/Supra-Mini-v5-8M",
 
102
  print("-" * 30)
103
  print("\nOutput:\n" + generate_text(test_prompt))
104
  ```
105
+ ## Use cases
106
+
107
+ 1. Educational research
108
+ 2. deployment or testing/fine-tuning on edge environments
109
+ 3. Or more simply, for fun
110
+
111
+ ## Limitations
112
+
113
+ 1. Cannot reason, chat, or code
114
+ 2. Incoherent more often than not
115
+ 3. Mostly unfactual
116
 
117
  ## Training guide
118
  We trained Supra Mini v5 8M on a single NVIDIA RTX 5060 Ti 16GB in ~11 hours for 2 epochs.<br>
119
  The full training code can be found in this repo as `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 16384), `train.py` (train the model) and `inference.py` (test the model).<br>
120
+ The model was trained on the first 5 billion tokens of Sample-10BT from Fineweb-Edu.