LH-Tech-AI commited on
Commit
95b158c
·
verified ·
1 Parent(s): 8202738

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - HuggingFaceFW/fineweb-edu
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - small
11
+ - cpu
12
+ - supra
13
+ - v2
14
+ - tiny
15
+ - mini
16
+ - open
17
+ - open-source
18
+ ---
19
+
20
+ # 🦅 Supra Mini v2 0.1M
21
+ Supra Mini v2 0.1M is a very tiny base model trained on 700 million tokens of Fineweb-Edu for 3 epochs as the second version of our Supra Mini series.
22
+
23
+ ## Model Config
24
+
25
+ - Parameters: 167,760 (0.1M)
26
+ - Architecture: Llama
27
+ - Vocab size with custom BPE tokenizer: 2048
28
+ - Hidden Size: 48
29
+ - Intermediate Size: 96
30
+ - Hidden Layers: 3
31
+ - Attention Heads: 4
32
+ - Max Position Embeddings: 256
33
+ - Learning rate: 6e-4
34
+ - Weight Decay: 0.01
35
+
36
+ ## Final Loss
37
+ This model reached a final train loss after 3 epochs of **4.XYZ**.
38
+
39
+ ## Benchmarks
40
+
41
+ All benchmarks were executed using `lm-eval`.
42
+
43
+ | Task | Value | Random level |
44
+ | :------------ | :----------: | -----------: |
45
+ | Arc_Easy | 0.XXXX | 0.25 (25%) |
46
+ | Wikitext | XX.XXXX | - |
47
+ | BLiMP | 0.XXXX | 0.5 (50%) |
48
+
49
+ ## Examples
50
+ **Prompt:** "Artificial intelligence is "<br>
51
+ **Output:**: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
52
+ <br><br>
53
+ **Prompt:** "The main concept of physics is "<br>
54
+ **Output:**: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
55
+ <br><br>
56
+ **Prompt:** "Once upon a time, "<br>
57
+ **Output:**: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
58
+
59
+ ## Usage
60
+ To use our model, just run this code using HF Transformers to execute the model:
61
+ ```python3
62
+ from transformers import pipeline
63
+ import torch
64
+
65
+ print("[*] Loading Supra Mini v2 0.1M model from Hugging Face Hub...")
66
+ pipe = pipeline(
67
+ "text-generation",
68
+ model="SupraLabs/Supra-Mini-v2-0.1M",
69
+ device_map="auto",
70
+ torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
71
+ )
72
+
73
+ def generate_text(prompt, max_length=150):
74
+ result = pipe(
75
+ prompt,
76
+ max_new_tokens=max_length,
77
+ do_sample=True,
78
+ temperature=0.5,
79
+ top_k=25,
80
+ top_p=0.9,
81
+ repetition_penalty=1.2,
82
+ pad_token_id=pipe.tokenizer.pad_token_id,
83
+ eos_token_id=pipe.tokenizer.eos_token_id
84
+ )
85
+ return result[0]['generated_text']
86
+
87
+ test_prompt = "The importance of education is"
88
+ print(f"\nPrompt: {test_prompt}")
89
+ print("-" * 30)
90
+ print("\nOutput:\n" + generate_text(test_prompt))
91
+ ```
92
+
93
+ ## Training guide
94
+ We trained Supra Mini v2 0.1M on a single T4 GPU in ~2 hours for 3 epochs.<br>
95
+ The full training code can be found in this repo as `run.sh` (easily run the complete pipeline), `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 2048), `train.py` (train the model) and `inference.py` (test the model).<br>
96
+ The model was trained on the first 700 million tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.
97
+
98
+ ## Final thoughts
99
+ As this is the second version of the Supra Mini series, we are very proud to release it today!<br>
100
+ *But:* stay tuned for more models and follow us to support our open-source work! 😊