amewebstudio commited on
Commit
868681c
·
verified ·
1 Parent(s): f3c45c4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - sparseflow
5
+ - sparse-attention
6
+ - efficient-nlp
7
+ datasets:
8
+ - gsm8k
9
+ - lighteval/MATH
10
+ - allenai/ai2_arc
11
+ - tau/commonsense_qa
12
+ - piqa
13
+ - allenai/sciq
14
+ - trivia_qa
15
+ - nq_open
16
+ - wikitext
17
+ ---
18
+
19
+ # SparseFlow v8
20
+
21
+ Efficient language model with **sparse attention** and **persistent memory**.
22
+
23
+ ## 📊 REAL Measured Metrics
24
+
25
+ | Metric | Value |
26
+ |--------|-------|
27
+ | Parameters | 71,359,746 |
28
+ | Perplexity | 14.77 |
29
+ | Attention Sparsity | 87.5% |
30
+ | Channel Sparsity | 75.0% |
31
+ | Peak Memory | 3.67 GB |
32
+
33
+ ## 🏗️ Architecture
34
+
35
+ - **Sparse Token Attention**: Attends to top-64 tokens per position
36
+ - **Sparse Channel FFN**: Activates top-128 channels
37
+ - **Persistent Memory**: 20,000 memory vectors
38
+ - **8 Transformer layers** with 512 dim
39
+
40
+ ## 📚 Training Data
41
+
42
+ Open source datasets only:
43
+ - GSM8K, MATH (mathematics)
44
+ - ARC, OpenBookQA, SciQ (science & reasoning)
45
+ - CommonsenseQA, PIQA (common sense)
46
+ - TriviaQA, Natural Questions (factual)
47
+ - WikiText-103 (language modeling)
48
+
49
+ ## 👨‍💻 Author
50
+
51
+ **Logo (Mike Amega)** — Ame Web Studio