ianshank commited on
Commit
ca65210
·
verified ·
1 Parent(s): cd6510c

Update model_card.md

Browse files
Files changed (1) hide show
  1. model_card.md +112 -112
model_card.md CHANGED
@@ -1,112 +1,112 @@
1
- ---
2
- language: en
3
- license: mit
4
- library_name: pytorch
5
- tags:
6
- - mixture-of-experts
7
- - multi-agent
8
- - neural-routing
9
- - cognitive-architecture
10
- - reinforcement-learning
11
- pipeline_tag: text-classification
12
- ---
13
-
14
- # MangoMAS-MoE-7M
15
-
16
- A ~7 million parameter **Mixture-of-Experts** (MoE) neural routing model for multi-agent task orchestration.
17
-
18
- ## Model Architecture
19
-
20
- ```
21
- Input (64-dim feature vector from featurize64())
22
-
23
- ┌─────┴─────┐
24
- │ GATE │ Linear(64→512) → ReLU → Linear(512→16) → Softmax
25
- └─────┬─────┘
26
-
27
- ╔═══════════════════════════════════════════════════╗
28
- ║ 16 Expert Towers (parallel) ║
29
- ║ Each: Linear(64→512) → ReLU → Linear(512→512) ║
30
- ║ → ReLU → Linear(512→256) ║
31
- ╚═══════════════════════════════════════════════════╝
32
-
33
- Weighted Sum (gate_weights × expert_outputs)
34
-
35
- Classifier Head: Linear(256→N_classes)
36
-
37
- Output Logits
38
- ```
39
-
40
- ### Parameter Count
41
-
42
- | Component | Parameters |
43
- |-----------|-----------|
44
- | Gate Network | 64×512 + 512 + 512×16 + 16 = ~41K |
45
- | 16 Expert Towers | 16 × (64×512 + 512 + 512×512 + 512 + 512×256 + 256) = ~6.9M |
46
- | Classifier Head | 256×10 + 10 = ~2.6K |
47
- | **Total** | **~6.95M** |
48
-
49
- ## Input: 64-Dimensional Feature Vector
50
-
51
- The model consumes a 64-dimensional feature vector produced by `featurize64()`:
52
-
53
- - **Dims 0-31**: Hash-based sinusoidal encoding (content fingerprint)
54
- - **Dims 32-47**: Domain tag detection (code, security, architecture, etc.)
55
- - **Dims 48-55**: Structural signals (length, punctuation, questions)
56
- - **Dims 56-59**: Sentiment polarity estimates
57
- - **Dims 60-63**: Novelty/complexity scores
58
-
59
- ## Training
60
-
61
- - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
62
- - **Updates**: Online learning from routing feedback
63
- - **Minimum reward threshold**: 0.1
64
- - **Device**: CPU / MPS / CUDA (auto-detected)
65
-
66
- ## Usage
67
-
68
- ```python
69
- import torch
70
- from moe_model import MixtureOfExperts7M, featurize64
71
-
72
- # Create model
73
- model = MixtureOfExperts7M(num_classes=10, num_experts=16)
74
-
75
- # Extract features
76
- features = featurize64("Design a secure REST API with authentication")
77
- x = torch.tensor([features], dtype=torch.float32)
78
-
79
- # Forward pass
80
- logits, gate_weights = model(x)
81
- print(f"Expert weights: {gate_weights}")
82
- print(f"Top expert: {gate_weights.argmax().item()}")
83
- ```
84
-
85
- ## Intended Use
86
-
87
- This model is part of the **MangoMAS** multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content.
88
-
89
- **Primary use cases:**
90
-
91
- - Multi-agent task routing
92
- - Expert selection for cognitive cell orchestration
93
- - Research demonstration of MoE architectures
94
-
95
- ## Interactive Demo
96
-
97
- Try the model live on the [MangoMAS HuggingFace Space](https://huggingface.co/spaces/ianshank/MangoMAS).
98
-
99
- ## Citation
100
-
101
- ```bibtex
102
- @software{mangomas2026,
103
- title={MangoMAS: Multi-Agent Cognitive Architecture},
104
- author={Shanker, Ian},
105
- year={2026},
106
- url={https://github.com/ianshank/MangoMAS}
107
- }
108
- ```
109
-
110
- ## Author
111
-
112
- Built by [Ian Shanker](https://huggingface.co/ianshank) — MangoMAS Engineering
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ library_name: pytorch
5
+ tags:
6
+ - mixture-of-experts
7
+ - multi-agent
8
+ - neural-routing
9
+ - cognitive-architecture
10
+ - reinforcement-learning
11
+ pipeline_tag: text-classification
12
+ ---
13
+
14
+ # MangoMAS-MoE-7M
15
+
16
+ A ~7 million parameter **Mixture-of-Experts** (MoE) neural routing model for multi-agent task orchestration.
17
+
18
+ ## Model Architecture
19
+
20
+ ```
21
+ Input (64-dim feature vector from featurize64())
22
+
23
+ ┌─────┴─────┐
24
+ │ GATE │ Linear(64→512) → ReLU → Linear(512→16) → Softmax
25
+ └─────┬─────┘
26
+
27
+ ╔═══════════════════════════════════════════════════╗
28
+ ║ 16 Expert Towers (parallel) ║
29
+ ║ Each: Linear(64→512) → ReLU → Linear(512→512) ║
30
+ ║ → ReLU → Linear(512→256) ║
31
+ ╚═══════════════════════════════════════════════════╝
32
+
33
+ Weighted Sum (gate_weights × expert_outputs)
34
+
35
+ Classifier Head: Linear(256→N_classes)
36
+
37
+ Output Logits
38
+ ```
39
+
40
+ ### Parameter Count
41
+
42
+ | Component | Parameters |
43
+ |-----------|-----------|
44
+ | Gate Network | 64×512 + 512 + 512×16 + 16 = ~41K |
45
+ | 16 Expert Towers | 16 × (64×512 + 512 + 512×512 + 512 + 512×256 + 256) = ~6.9M |
46
+ | Classifier Head | 256×10 + 10 = ~2.6K |
47
+ | **Total** | **~6.95M** |
48
+
49
+ ## Input: 64-Dimensional Feature Vector
50
+
51
+ The model consumes a 64-dimensional feature vector produced by `featurize64()`:
52
+
53
+ - **Dims 0-31**: Hash-based sinusoidal encoding (content fingerprint)
54
+ - **Dims 32-47**: Domain tag detection (code, security, architecture, etc.)
55
+ - **Dims 48-55**: Structural signals (length, punctuation, questions)
56
+ - **Dims 56-59**: Sentiment polarity estimates
57
+ - **Dims 60-63**: Novelty/complexity scores
58
+
59
+ ## Training
60
+
61
+ - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
62
+ - **Updates**: Online learning from routing feedback
63
+ - **Minimum reward threshold**: 0.1
64
+ - **Device**: CPU / MPS / CUDA (auto-detected)
65
+
66
+ ## Usage
67
+
68
+ ```python
69
+ import torch
70
+ from moe_model import MixtureOfExperts7M, featurize64
71
+
72
+ # Create model
73
+ model = MixtureOfExperts7M(num_classes=10, num_experts=16)
74
+
75
+ # Extract features
76
+ features = featurize64("Design a secure REST API with authentication")
77
+ x = torch.tensor([features], dtype=torch.float32)
78
+
79
+ # Forward pass
80
+ logits, gate_weights = model(x)
81
+ print(f"Expert weights: {gate_weights}")
82
+ print(f"Top expert: {gate_weights.argmax().item()}")
83
+ ```
84
+
85
+ ## Intended Use
86
+
87
+ This model is part of the **MangoMAS** multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content.
88
+
89
+ **Primary use cases:**
90
+
91
+ - Multi-agent task routing
92
+ - Expert selection for cognitive cell orchestration
93
+ - Research demonstration of MoE architectures
94
+
95
+ ## Interactive Demo
96
+
97
+ Try the model live on the [MangoMAS HuggingFace Space](https://huggingface.co/spaces/ianshank/MangoMAS).
98
+
99
+ ## Citation
100
+
101
+ ```bibtex
102
+ @software{mangomas2026,
103
+ title={MangoMAS: Multi-Agent Cognitive Architecture},
104
+ author={Cruickshank, Ian},
105
+ year={2026},
106
+ url={https://github.com/ianshank/MangoMAS}
107
+ }
108
+ ```
109
+
110
+ ## Author
111
+
112
+ Built by [Ian Cruickshank](https://huggingface.co/ianshank) — MangoMAS Engineering