AxionLab-official commited on
Commit
333ef2a
·
verified ·
1 Parent(s): 0656e5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -33
README.md CHANGED
@@ -21,41 +21,54 @@ library_name: transformers
21
  ---
22
  ## **i'm not releasing yet LH, if you want(it probably need it) changes, you can change!!!**!
23
 
24
- ## **MicroSupra-1k**
25
 
26
  So... have you ever seen a model that runs on a 3 dollars hardware? No? If no, Now you're seeing!
27
 
28
- MicroSupra-1k is a 1k parameters model trained on 300M Tokens of FineWeb-Edu for 3 epochs for 1 minute(Yes! 1 minute!) on a GTX 750Ti 4GB(AxionLab Hardware)
29
 
30
- **Some model outputs:**
31
 
32
- [*] Prompt: The main concept of physics is
 
 
 
 
 
 
 
 
33
 
34
- [*] Output: The main concept of physics is a,s and the. thet to, theing.... the,a then,c,i to, thee in b. toed.,,e theyalp the in,er thees- s,el,,,,
35
- and, the of ine,,s the of cs of thesss the. f. to. thesining andor dar,,al the,. of p.
36
- the.s the.,,s. anded,e. of, ofed, l toinging and themsr the of of. to
37
- to thes thes aen,., ofes of a.
38
 
39
- [*] Prompt: My name is
40
-
41
- [*] Output: My name is edie and. as the, to. the, iningt thee the ofingi in
 
42
  the., anda.-eo
43
  ofles, b the,er,s fing.ssp the the
44
  , of of, the,al, d to the m, the, to toed,
45
  seng,,.y. in the,., in and them the thened.sing to
46
  the of of andan the the,, the
47
- to..,,sing,,.aring the the. of.al.,s ofcal ar s..e and.sssor of, and and.
48
-
49
- [*] Prompt: Question: What is the capital of France?\nAnswer:
50
-
51
- [*] Output: Question: What is the capital of France?
52
-
53
- Answer:,. and to the. toc. ofs the m,a thee.. the, f ofling. as.,,y bt, the p, in, the,,ees toed ing to.o,
 
 
 
 
 
 
 
54
  thes. the..,s the.ed and andang,,ed the of,,ms. of, thei the, the,ey,,s l.ing toe the the,se the to, the, the,aror, the of-. in the. the. the,e the of ds to,ic the the aal at the..
55
- ingssy s and and
56
-
57
 
58
- ## Get Started 🚀
59
 
60
  ```python3
61
  print("[*] Loading libraries...")
@@ -71,7 +84,7 @@ print("[*] Loading model...")
71
  model = LlamaForCausalLM.from_pretrained(model_path)
72
  model.eval()
73
 
74
- prompt = "The most intelligent person on the world is "
75
  print(f"[*] Prompt: {prompt!r}")
76
 
77
  inputs = tokenizer(prompt, return_tensors="pt")
@@ -90,20 +103,16 @@ with torch.no_grad():
90
  )
91
 
92
  print("[*] Output:", tokenizer.decode(outputs[0], skip_special_tokens=True))
93
- ```
94
 
95
-
96
- **🚫What the model CAN'T do:**
97
- Think
98
- Chat
99
- even predict the next token correctly lol
100
-
101
- **Why SupraLabs created this???**
102
 
103
  Because we are experimenting sizes, experiments(like 1Bit quant, distillation(NEW THINGS ARE COMING WITH DISTILLATION! GET TUNED!), pruning) all to better your experience! We are working to big things!
104
 
105
- **Final thought**
 
 
106
 
107
- Even without any intelligence, it shows that scaling laws are real. This ant model doesn't know how to talk, but we all know it emotions 🤖🫶
108
 
109
- SupraLabs are excited to work at more things for you!
 
21
  ---
22
  ## **i'm not releasing yet LH, if you want(it probably need it) changes, you can change!!!**!
23
 
24
+ ## **🦅 MicroSupra-1k**
25
 
26
  So... have you ever seen a model that runs on a 3 dollars hardware? No? If no, Now you're seeing!
27
 
28
+ MicroSupra-1k is an bacteria base model(lol) trained on 300 million tokens of Fineweb-Edu for 3 epochs as the **first version** of our MicroSupra.
29
 
30
+ ## Model Config
31
 
32
+ - Parameters: 1046 (0.001M)
33
+ - Architecture: LLaMa
34
+ - Vocab size: 1024
35
+ - Hidden Size: 1
36
+ - Intermediate Size: 2
37
+ - Hidden Layers: 1
38
+ - Attention Heads: 1
39
+ - Max Position Embeddings: 256
40
+ - Learning rate: <code>5e-3<code>
41
 
42
+ ## Final Loss
43
+ This model reached a final train loss after 3 epochs of **6.046**.
 
 
44
 
45
+ ## Examples
46
+ **Prompt:** "My name is "<br>
47
+ **Output:**: *"My name is ed and. as the, to. the, in
48
+ ingt thee the ofingi in
49
  the., anda.-eo
50
  ofles, b the,er,s fing.ssp the the
51
  , of of, the,al, d to the m, the, to toed,
52
  seng,,.y. in the,., in and them the thened.sing to
53
  the of of andan the the,, the
54
+ to..,,sing,,.aring the the. of.al.,s ofcal ar s..e and.sssor of, and and."*
55
+ <br><br>
56
+ **Prompt:** "The main concept of physics is "<br>
57
+ **Output:**: *"The main concept of physics is a,
58
+ s and the. thet to, theing.... the,a then,c,i to, thee in b. toed.,,e theyalp the in,er thees- s,el,,,,
59
+ and, the of ine,,s the of cs of thesss the. f. to. thesining andor dar,,al the,. of p.
60
+ the.s the.,,s. anded,e. of, ofed, l toinging and themsr the of of. to
61
+ to thes thes aen,., ofes of a."*
62
+ <br><br>
63
+ **Prompt:** "Question: What is the capital of France?\nAnswer: "<br>
64
+ **Output:**: *"Question: What is the capital of France?
65
+ Answer:,. and to the. toc. ofs the m,a thee.. the, f ofling. as.,,y bt, the p
66
+ , in, the,,ees toed ing to.
67
+ o,
68
  thes. the..,s the.ed and andang,,ed the of,,ms. of, thei the, the,ey,,s l.ing toe the the,se the to, the, the,aror, the of-. in the. the. the,e the of ds to,ic the the aal at the..
69
+ ingssy s and and"*
 
70
 
71
+ ## Usage 🚀
72
 
73
  ```python3
74
  print("[*] Loading libraries...")
 
84
  model = LlamaForCausalLM.from_pretrained(model_path)
85
  model.eval()
86
 
87
+ prompt = "Question: What is the capital of France?\nAnswer:"
88
  print(f"[*] Prompt: {prompt!r}")
89
 
90
  inputs = tokenizer(prompt, return_tensors="pt")
 
103
  )
104
 
105
  print("[*] Output:", tokenizer.decode(outputs[0], skip_special_tokens=True))
 
106
 
107
+ ```
108
+ ## Why SupraLabs created this???
 
 
 
 
 
109
 
110
  Because we are experimenting sizes, experiments(like 1Bit quant, distillation(NEW THINGS ARE COMING WITH DISTILLATION! GET TUNED!), pruning) all to better your experience! We are working to big things!
111
 
112
+ ## Training guide
113
+ We trained MicroSupra on a GTX750 Ti 4GB in 1 Minute for 3 epochs.<br>
114
+ The model was trained on the first 300 million tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.
115
 
116
+ ## Final thoughts
117
 
118
+ Even without any intelligence, it shows that scaling laws are real. This ant model doesn't know how to talk, but we all know it emotions 🤖🫶