File size: 6,228 Bytes
0e55d2a
ac5d535
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0e55d2a
ac5d535
ce078cb
ac5d535
 
 
dbdbf22
ac5d535
 
 
 
 
e8e8af0
ac5d535
 
 
 
 
ce078cb
ac5d535
3920cee
ac5d535
b6aee65
ac5d535
3920cee
ac5d535
918168d
ac5d535
918168d
 
 
ac5d535
918168d
ac5d535
918168d
 
 
 
 
 
3920cee
 
 
777b4bf
3920cee
 
 
 
 
 
 
 
0f6bcc6
 
 
918168d
 
ac5d535
 
 
 
ede3ae3
 
 
232ef3f
a522301
ede3ae3
 
 
 
 
 
9a029fc
ede3ae3
ac5d535
 
 
 
 
 
2a0a82f
dbdbf22
ac5d535
 
ede3ae3
ac5d535
 
 
 
 
 
 
 
ede3ae3
ac5d535
 
 
 
 
 
ce078cb
ac5d535
 
 
 
 
 
 
 
 
 
 
 
 
 
ede3ae3
ac5d535
 
ede3ae3
ac5d535
ede3ae3
ac5d535
f89364c
ac5d535
 
 
 
a522301
ac5d535
 
 
 
a522301
ac5d535
 
 
 
 
 
fcad53f
 
ac5d535
 
 
fcad53f
 
ac5d535
a522301
08631f8
 
a522301
1ae17a8
ac5d535
 
 
 
 
 
 
a522301
 
f89364c
a522301
 
 
 
 
d37afed
a522301
ac5d535
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
license: apache-2.0
language:
- en
- hi
- mr
- ta
- te
- kn
- ml
- bn
- pa
- gu
- or
pipeline_tag: text-generation
library_name: transformers
---

# Nandi-Mini-600M-Early-Checkpoint

## Introduction

Nandi-Mini-600M-Early-Checkpoint is an early-stage checkpoint (After **250 Billions tokens**) from the upcoming **Nandi-Mini-600M** model family, *this is not the final model*, a compact multilingual language model focused on strong efficiency, deployment flexibility, and Indic language support.

The model is being trained completely from scratch and is designed to deliver strong performance at low compute and memory budgets. This checkpoint is shared to provide an early look into the model’s scaling behavior and training progress.

This release is an **early checkpoint** and not the final converged model. Performance is expected to improve further with continued training and scaling.

📢 We will soon share technical blog ! Stay tuned!

---

### Architectural Highlights

Nandi-Mini-600M introduces several efficiency-focused architectural optimizations designed for compact yet capable language models.

#### Shared KV (Shared Key-Value Vectors)

Shared KV is one of the core architectural ideas explored in Nandi-Mini. Instead of computing separate Key and Value projections, both reuse a shared latent representation, while a lightweight Key normalization step is applied specifically for attention computation.

This design reduces KV-cache memory usage by ~50% during inference with only a small increase in compute overhead, since RoPE and Key normalization are applied dynamically during attention computation.

Nandi supports two KV cache modes:

```json
"kv_cache_mode": "shared"
```

Uses Shared KV, reducing KV-cache memory by ~50% with slightly higher compute overhead.

```json
"kv_cache_mode": "vanilla"
```

Uses standard separate Key-Value caching for maximum inference compatibility and lower compute overhead.

### KV-Cache Memory Comparison

<p align="center">
  <img src="./shared_kv_cache_comparison_improved.png" width="650"/>
</p>

- Vanilla KV → Standard KV-cache memory usage
- Shared KV → ~50% lower KV-cache footprint

Shared KV is part of our broader focus on deployable foundation models optimized for:

- On-premise AI systems
- Memory-constrained deployments
- Edge devices
- Long-context inference workloads

This remains an active research area within the Nandi model family, and we plan to share deeper technical details in upcoming engineering blogs.

---


### Model Details

- Type: Causal Language Model
- Training Stage: Early Pretraining Checkpoint (**250 Billions tokens**)
- Parameters: ~600M
- Architecture: Transformer decoder
- Positional Encoding: RoPE
- Normalization: RMSNorm + QK Norm
- Activation: SwiGLU
- Attention: GQA + Shared KV
- Embeddings: Tied embeddings with factorized design
- Context length: 2,048 tokens (planned to be extended to 32,000 tokens)
- Vocabulary Size: 131,072


---

# 📊 Benchmark Results

This is not the final model, this is an early checkpoint. So the results are not final. Only 20% training is done.

## General Benchmarks

| Model | Trained Tokens | HellaSwag | WinoGrande | OBQA | PIQA | GPQA | ARC-e | ARC-c | MMLU | Average |
|---|---|---|---|---|---|---|---|---|---|---|
| MobiLlama-0.5B-Base | 1.3 | 39.65 | 53.67 | 30.60 | 70.35 | 24.33 | 52.82 | 23.63 | 24.18 | 39.90 |
| Qwen-2-0.5B-Base | 12 | 49.01 | 57.69 | 33.20 | 68.98 | 27.23 | 54.79 | 25.42 | 44.06 | 45.05 |
| Qwen2.5-0.5B-Base | 18 | 52.16 | 56.82 | 35.40 | 70.29 | 24.10 | 64.64 | 29.86 | 47.41 | 47.59 |
| Qwen3-0.6B-Base | 36 | 53.77 | 59.19 | 34.40 | 70.29 | 30.80 | 65.44 | 33.78 | 50.34 | 49.75 |
| Qwen3.5-0.8B-Base | 36 | 54.87 | 60.54 | 35.80 | 70.02 | 31.25 | 70.50 | 38.23 | 52.73 | 51.74 |
| SmolLM-360M-Base | 0.6 | 53.33 | 57.22 | 37.60 | 70.56 | 21.20 | 70.24 | 33.27 | 24.92 | 46.04 |
| SmolLM2-360M-Base | 4 | 56.30 | 59.19 | 37.60 | 71.81 | 25.22 | 67.88 | 36.68 | 25.55 | 47.53 |
| **Nandi-Mini-600M-Early-Checkpoint-Base** | **0.2** | 44.86 | 54.77 | 34.80 | 68.60 | 26.33 | 64.73 | 29.70 | 29.01 | 44.10 |


---

## Tokenization Fertility Score Across Languages

| Language  | SmolLM3-3B | Qwen3-0.6B-Base | Sarvam-1 | Nandi-Mini-600M |
|-----------|------------|-----------------|----------|------------------|
| English   | 1.17 | 1.16 | 1.32 | **1.18** |
| Bengali   | 8.66 | 7.51 | 1.55 | **1.44** |
| Gujarati  | 10.47 | 9.37 | 1.55 | **1.53** |
| Hindi     | 2.71 | 5.14 | **1.25** | 1.32 |
| Kannada   | 16.43 | 12.96 | 2.10 | **1.90** |
| Malayalam | 17.77 | 14.56 | 2.49 | **2.05** |
| Marathi   | 3.73 | 6.70 | 1.55 | **1.55** |
| Oriya     | 19.07 | 15.75 | **2.18** | 2.68 |
| Punjabi   | 9.23 | 8.66 | 1.47 | **1.42** |
| Tamil     | 13.56 | 10.93 | 2.06 | **2.05** |
| Telugu    | 15.40 | 13.38 | 2.09 | **1.77** |
| Assamese  | 9.26 | 8.13 | 4.31 | **1.51** |

---


## 🌍 Supported Languages

The model is trained on English and a diverse set of Indic languages, including:

Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia

# 🚀 Usage

```python
!pip install transformers=='5.4.0'

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    dtype=torch.bfloat16
).to(device).eval()


#model.config.kv_cache_mode = "shared" # Use this one if wants to save 50% KV cache, but this will slight more compute
model.config.kv_cache_mode = "vanilla"

prompt = """The night was quiet and the streets were empty"""

model_inputs = tokenizer(
    [prompt],
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
        **model_inputs,
        max_new_tokens=50,
        do_sample=True,
        temperature=0.3,
        top_k=20,
        top_p=0.95,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
        use_cache=True,  
    )

response = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True
)

print(response)