amewebstudio
/

sparseflow-chat-v8

+---
+license: mit
+tags:
+- sparseflow
+- sparse-attention
+- efficient-nlp
+datasets:
+- gsm8k
+- lighteval/MATH
+- allenai/ai2_arc
+- tau/commonsense_qa
+- piqa
+- allenai/sciq
+- trivia_qa
+- nq_open
+- wikitext
+---
+# SparseFlow v8
+Efficient language model with **sparse attention** and **persistent memory**.
+## 📊 REAL Measured Metrics
+| Metric | Value |
+|--------|-------|
+| Parameters | 71,359,746 |
+| Perplexity | 14.77 |
+| Attention Sparsity | 87.5% |
+| Channel Sparsity | 75.0% |
+| Peak Memory | 3.67 GB |
+## 🏗️ Architecture
+- **Sparse Token Attention**: Attends to top-64 tokens per position
+- **Sparse Channel FFN**: Activates top-128 channels
+- **Persistent Memory**: 20,000 memory vectors
+- **8 Transformer layers** with 512 dim
+## 📚 Training Data
+Open source datasets only:
+- GSM8K, MATH (mathematics)
+- ARC, OpenBookQA, SciQ (science & reasoning)
+- CommonsenseQA, PIQA (common sense)
+- TriviaQA, Natural Questions (factual)
+- WikiText-103 (language modeling)
+## 👨‍💻 Author
+**Logo (Mike Amega)** — Ame Web Studio