vishesh-t27 commited on
Commit
ede3ae3
·
verified ·
1 Parent(s): 0f6bcc6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -42
README.md CHANGED
@@ -49,24 +49,6 @@ Stay tuned!
49
 
50
  ---
51
 
52
- ## Model Overview
53
-
54
- **Repository:** `FrontiersMind/Nandi-mini-500M-Early-Checkpoint`
55
-
56
- ### Model Details
57
-
58
- - Type: Causal Language Model
59
- - Training Stage: Early Pretraining Checkpoint
60
- - Parameters: ~500M
61
- - Architecture: Transformer decoder
62
- - Positional Encoding: RoPE
63
- - Normalization: RMSNorm + QK Norm
64
- - Activation: SwiGLU
65
- - Attention: GQA + Shared KV
66
- - Embeddings: Tied embeddings with factorized design
67
- - Context Length: 2,048 tokens
68
- - Vocabulary Size: 131,072
69
-
70
 
71
  ### Architectural Highlights
72
 
@@ -112,20 +94,21 @@ This remains an active research area within the Nandi model family, and we plan
112
 
113
  ---
114
 
115
- ## 🌍 Supported Languages
116
 
117
- The model is trained on English and multiple Indic languages, including:
 
 
 
 
 
 
 
 
 
 
 
 
118
 
119
- - Hindi
120
- - Bengali
121
- - Tamil
122
- - Telugu
123
- - Marathi
124
- - Gujarati
125
- - Kannada
126
- - Malayalam
127
- - Punjabi
128
- - Odia
129
 
130
  ---
131
 
@@ -133,7 +116,7 @@ The model is trained on English and multiple Indic languages, including:
133
 
134
  ## General Benchmarks
135
 
136
- | Model | Budget (T Tokens) | HellaSwag | WinoGrande | OBQA | PIQA | GPQA | ARC-e | ARC-c | MMLU | Average |
137
  |---|---|---|---|---|---|---|---|---|---|---|
138
  | MobiLlama-0.5B-Base | 1.3 | 39.65 | 53.67 | 30.60 | 70.35 | 24.33 | 52.82 | 23.63 | 24.18 | 39.90 |
139
  | Qwen-2-0.5B-Base | 12 | 49.01 | 57.69 | 33.20 | 68.98 | 27.23 | 54.79 | 25.42 | 44.06 | 45.05 |
@@ -142,7 +125,7 @@ The model is trained on English and multiple Indic languages, including:
142
  | Qwen3.5-0.8B-Base | 36 | 54.87 | 60.54 | 35.80 | 70.02 | 31.25 | 70.50 | 38.23 | 52.73 | 51.74 |
143
  | SmolLM-360M-Base | 0.6 | 53.33 | 57.22 | 37.60 | 70.56 | 21.20 | 70.24 | 33.27 | 24.92 | 46.04 |
144
  | SmolLM2-360M-Base | 4 | 56.30 | 59.19 | 37.60 | 71.81 | 25.22 | 67.88 | 36.68 | 25.55 | 47.53 |
145
- | **Nandi-Mini-500M-Early-Checkpoint** | **0.5** | **44.86** | **54.77** | **34.80** | **68.60** | **26.33** | **64.73** | **29.70** | **29.01** | **44.10** |
146
 
147
 
148
  ---
@@ -164,20 +147,14 @@ The model is trained on English and multiple Indic languages, including:
164
  | Telugu | 15.40 | 13.38 | 2.09 | **1.77** |
165
  | Assamese | 9.26 | 8.13 | 4.31 | **1.51** |
166
 
167
- ### Why Fertility Matters
168
-
169
- Lower fertility scores indicate more efficient tokenization, meaning fewer tokens are needed to represent text in a language.
170
 
171
- This leads to:
172
 
173
- - Better context utilization
174
- - Lower inference cost
175
- - Reduced latency
176
- - Improved multilingual efficiency
177
 
178
- Nandi-Mini’s tokenizer is heavily optimized for Indic languages and demonstrates strong compression efficiency across several scripts.
179
 
180
- ---
181
 
182
  # 🚀 Usage
183
 
 
49
 
50
  ---
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ### Architectural Highlights
54
 
 
94
 
95
  ---
96
 
 
97
 
98
+ ### Model Details
99
+
100
+ - Type: Causal Language Model
101
+ - Training Stage: Early Pretraining Checkpoint
102
+ - Parameters: ~500M
103
+ - Architecture: Transformer decoder
104
+ - Positional Encoding: RoPE
105
+ - Normalization: RMSNorm + QK Norm
106
+ - Activation: SwiGLU
107
+ - Attention: GQA + Shared KV
108
+ - Embeddings: Tied embeddings with factorized design
109
+ - Context Length: 2,048 tokens
110
+ - Vocabulary Size: 131,072
111
 
 
 
 
 
 
 
 
 
 
 
112
 
113
  ---
114
 
 
116
 
117
  ## General Benchmarks
118
 
119
+ | Model | Trained Tokens | HellaSwag | WinoGrande | OBQA | PIQA | GPQA | ARC-e | ARC-c | MMLU | Average |
120
  |---|---|---|---|---|---|---|---|---|---|---|
121
  | MobiLlama-0.5B-Base | 1.3 | 39.65 | 53.67 | 30.60 | 70.35 | 24.33 | 52.82 | 23.63 | 24.18 | 39.90 |
122
  | Qwen-2-0.5B-Base | 12 | 49.01 | 57.69 | 33.20 | 68.98 | 27.23 | 54.79 | 25.42 | 44.06 | 45.05 |
 
125
  | Qwen3.5-0.8B-Base | 36 | 54.87 | 60.54 | 35.80 | 70.02 | 31.25 | 70.50 | 38.23 | 52.73 | 51.74 |
126
  | SmolLM-360M-Base | 0.6 | 53.33 | 57.22 | 37.60 | 70.56 | 21.20 | 70.24 | 33.27 | 24.92 | 46.04 |
127
  | SmolLM2-360M-Base | 4 | 56.30 | 59.19 | 37.60 | 71.81 | 25.22 | 67.88 | 36.68 | 25.55 | 47.53 |
128
+ | **Nandi-Mini-600M-Early-Checkpoint-Base** | **0.2** | 44.86 | 54.77 | 34.80 | 68.60 | 26.33 | 64.73 | 29.70 | 29.01 | 44.10 |
129
 
130
 
131
  ---
 
147
  | Telugu | 15.40 | 13.38 | 2.09 | **1.77** |
148
  | Assamese | 9.26 | 8.13 | 4.31 | **1.51** |
149
 
150
+ ---
 
 
151
 
 
152
 
153
+ ## 🌍 Supported Languages
 
 
 
154
 
155
+ The model is trained on English and a diverse set of Indic languages, including:
156
 
157
+ - Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
158
 
159
  # 🚀 Usage
160