Nandi-Mini-150M / README.md
vishesh-t27's picture
Update README.md
31c8ace verified
---
license: apache-2.0
language:
- en
- hi
- mr
- ta
- te
- kn
- ml
- bn
- pa
- gu
- or
pipeline_tag: text-generation
library_name: transformers
---
# Nandi-Mini-150M
## Introduction
Nandi-Mini-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is pre-trained from scratch on **525 billion tokens** and supports **English and 10 Indic languages**.
We do not employ any benchmaxing tricks; the model is designed to be genuinely strong and highly effective for fine-tuning on downstream tasks.
Nandi-Mini-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments.
Nandi-Mini-150M brings the following key features:
- Strong **multilingual capability** across English and Indic languages
- Efficient design enabling **high performance at small scale (150M parameters)**
- Reduced memory footprint using **factorized embeddings**
- Better parameter efficiency through **layer sharing**
## ๐Ÿ“ Upcoming Releases & Roadmap
Weโ€™re just getting started with the Nandi series ๐Ÿš€
- **Nandi-Mini-150M (Base)** โ€” *Available now*
- **Nandi-Mini-150M (Instruct)** โ€” *Available now*
- **Nandi-Mini-500M (Base + Instruct)** โ€” Pre-Training Going On
- **Nandi-Mini-1B (Base + Instruct)** โ€” Pre-Training Going On
We are actively working on expanding the Nandi family to cover a wider range of use casesโ€”from lightweight edge deployments to more capable instruction-tuned systems.
๐Ÿ“ข **Blogs & technical deep-dives coming soon**, where weโ€™ll share:
- Architecture decisions and design trade-offs
- Training insights and dataset composition
- Benchmarks and real-world applications
Stay tuned!
**This repo contains the base Nandi-Mini-150M model**, which has the following features:
- Type: Causal Language Model
- Training Stage: Pretraining (from scratch)
- Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings, **factorize embeddings**
- Number of Layers: 16*2 [Layer Sharing, effective layer =32]
- Context Length: 2,048 tokens
- Vocabulary Size: 131,072
## ๐ŸŒ Supported Languages
The model is trained on English and a diverse set of Indic languages, including:
- Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
## Benchmark Results
## ๐Ÿ“Š Benchmark Comparison (~150M Class)
| Model Name | Parameters | Tokens(B) | HellaSwag | Winogrande | GPQA | MMLU | GSM8K | HumanEval | Average |
|------------------|---------------|------------------|----------|------------|------|------|-------|-----------|---------|
| Mobile-LLM-125M | 125 | 1000 | 38.90 | 53.10 | - | - | - | - | - |
| SmolLM-135M-Base | 135 | 600 | 42.66| 53.03 | 25.44| 25.30| 1.36 | 0.00 | 24.63 |
| SmolLM2-135M-Base| 135 | 2000 | 43.13| 53.27 | 22.09| 24.09| 1.74 | 0.00 | 24.05 |
| **Nandi-Mini-150M-Base** | **150** | **500** | 37.20 | 52.32 | **28.57** | **28.86** | **2.58** | **4.27** | **25.63** |
## ๐Ÿ“Š Model Benchmark Comparison With Slightly Bigger Models (350Mโ€“600M Class)
| Model Name | Parameters | Tokens(B) | HellaSwag | Winogrande | GPQA | MMLU | GSM8K | HumanEval | Average |
|---------------------|---------------|------------------|----------|------------|------|------|-------|-----------|---------|
| Mobile-LLM-360M | 350 | 1000 | 49.60 | 56.59 | - | - | - | - | - |
| Qwen-2-0.5-Base | 500 | 12000 | 49.01 | 57.69 | 27.23| 44.06| 10.61 | 22.56 | 35.19 |
| Qwen2.5-0.5B-Base | 500 | 18000 | 52.16 | 56.82 | 24.10| 47.41| 4.77 | 29.87 | 35.86 |
| Qwen3-0.6B-Base | 600 | 36000 | 53.77 | 59.19 | 30.80| 50.34| 15.31 | 28.04 | 39.58 |
| SmolLM-360M-Base | 360 | 600 | 53.33 | 57.22 | 21.20| 24.92| 2.19 | 1.21 | 26.68 |
| SmolLM2-360M-Base | 360 | 4000 | 56.30 | 59.19 | 25.22| 25.55| 2.88 | 0.00 | 28.19 |
| **Nandi-Mini-150M-Base** | **150** | 500 | 37.20| 52.32 | 28.57 | 28.86 | 2.58 | 4.27 | 25.63 |
### Note
Mobile-LLM model checkpoints are not publicly available; their results are reported directly from the original paper. All other models have been evaluated using `lm-eval` under a consistent setup. Human-Eval & GSM8K have been evaluated using Greedy-decoding for now for all models.
## Performance onf Finetuned Tasks
#### CrossSum-Hindi (CHRF) Results
We finetuned our model and other open source models on [Google's IndicGenBench](https://github.com/google-research-datasets/indic-gen-bench/) Crossum-Hindi. Nandi-mini-150M was able to outperform other models.
| Base Model | Before Finetune | After Finetune |
|------------------------|-----------------|----------------|
| Qwen-2-0.5-Base | 0.09 | 4.22 |
| Qwen2.5-0.5B-Base | 0.43 | 4.18 |
| SmolLM-135M-Base | 0.09 | 2.55 |
| SmolLM-360M-Base | 0.09 | 2.99 |
| SmolLM2-135M-Base | 0.09 | 2.67 |
| SmolLM2-360M-Base | 0.12 | 3.51 |
| Nandi-mini-150M | 0.10 | **4.37** |
## Tokenization Fertility Score across Languages
| Language | SmolLM3-3B | Qwen3-0.6B-Base | Sarvam-1 | Nandi-Mini-150M |
|-----------|------------|-----------------|----------|------------------|
| English | 1.17 | 1.16 | 1.32 | **1.18** |
| Bengali | 8.66 | 7.51 | 1.55 | **1.44** |
| Gujarati | 10.47 | 9.37 | 1.55 | **1.53** |
| Hindi | 2.71 | 5.14 | **1.25** | 1.32 |
| Kannada | 16.43 | 12.96 | 2.10 | **1.90** |
| Malayalam | 17.77 | 14.56 | 2.49 | **2.05** |
| Marathi | 3.73 | 6.70 | 1.55 | **1.55** |
| Oriya | 19.07 | 15.75 |**2.18** | 2.68 |
| Punjabi | 9.23 | 8.66 | 1.47 | **1.42** |
| Tamil | 13.56 | 10.93 | 2.06 | **2.05** |
| Telugu | 15.40 | 13.38 | 2.09 | **1.77** |
| Assamese | 9.26 | 8.13 | 4.31 | **1.51** |
## ๐Ÿš€ Usage
```python
!pip install transformers=='5.4.0'
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "FrontiersMind/Nandi-mini-150M"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
dtype=torch.bfloat16
).to(device).eval()
prompt = """
The night was quiet and the streets were empty.
A single light flickered in the distance. Someone was walking slowly, carrying a small bag. Suddenly,
"""
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
outputs = model.generate(
**model_inputs,
max_new_tokens=50,
do_sample=True,
temperature=0.3,
top_k=20,
repetition_penalty=1.1,
top_p=0.95
)
response = tokenizer.decode(
outputs[0],
skip_special_tokens=True,
)
print(response)
```
## ๐Ÿ“ฌ Feedback & Suggestions
Weโ€™d love to hear your thoughts, feedback, and ideas!
- **Discord**: https://discord.gg/ZGdjCdRt
- **Email:** support@frontiersmind.ai
- **Official Website** https://www.frontiersmind.ai/
- **LinkedIn:** https://www.linkedin.com/company/frontiersmind/
- **X (Twitter):** https://x.com/FrontiersMind