---
license: apache-2.0
language:
- en
- hi
- mr
- ta
- te
- kn
- ml
- bn
- pa
- gu
- or
pipeline_tag: text-generation
library_name: transformers
---

# Nandi-Mini-150M

## Introduction

Nandi-Mini-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is pre-trained from scratch on **525 billion tokens** and supports **English and 10 Indic languages**.

We do not employ any benchmaxing tricks; the model is designed to be genuinely strong and highly effective for fine-tuning on downstream tasks. 

Nandi-Mini-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments.
Nandi-Mini-150M brings the following key features:

- Strong **multilingual capability** across English and Indic languages
- Efficient design enabling **high performance at small scale (150M parameters)**
- Reduced memory footprint using **factorized embeddings**
- Better parameter efficiency through **layer sharing**

## 📝 Upcoming Releases & Roadmap

We’re just getting started with the Nandi series 🚀

- **Nandi-Mini-150M (Base)** — *Available now*  
- **Nandi-Mini-150M (Instruct)** — *Available now*   
- **Nandi-Mini-500M (Base + Instruct)** — Pre-Training Going On
- **Nandi-Mini-1B (Base + Instruct)** — Pre-Training Going On

We are actively working on expanding the Nandi family to cover a wider range of use cases—from lightweight edge deployments to more capable instruction-tuned systems.

📢 **Blogs & technical deep-dives coming soon**, where we’ll share:
- Architecture decisions and design trade-offs  
- Training insights and dataset composition  
- Benchmarks and real-world applications  

Stay tuned!

**This repo contains the base Nandi-Mini-150M model**, which has the following features:

- Type: Causal Language Model
- Training Stage: Pretraining (from scratch)
- Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings, **factorize embeddings**
- Number of Layers: 16*2 [Layer Sharing, effective layer =32]
- Context Length: 2,048 tokens
- Vocabulary Size: 131,072

## 🌍 Supported Languages

The model is trained on English and a diverse set of Indic languages, including:

- Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia

## Benchmark Results

## 📊 Benchmark Comparison (~150M Class)

| Model Name         | Parameters | Tokens(B) | HellaSwag | Winogrande | GPQA  | MMLU | GSM8K | HumanEval | Average |
|------------------|---------------|------------------|----------|------------|------|------|-------|-----------|---------|
| Mobile-LLM-125M  | 125           | 1000             | 38.90    | 53.10  | -    | -    | -     | -         | -       |
| SmolLM-135M-Base | 135           | 600              | 42.66| 53.03  | 25.44| 25.30| 1.36  | 0.00      | 24.63   |
| SmolLM2-135M-Base| 135           | 2000             | 43.13| 53.27  | 22.09| 24.09| 1.74  | 0.00      | 24.05   |
| **Nandi-Mini-150M-Base** | **150**     | **500**          | 37.20    | 52.32      | **28.57** | **28.86** | **2.58** | **4.27** | **25.63** |


## 📊 Model Benchmark Comparison With Slightly Bigger Models (350M–600M Class)

| Model Name            | Parameters | Tokens(B) | HellaSwag | Winogrande | GPQA  | MMLU | GSM8K | HumanEval | Average |
|---------------------|---------------|------------------|----------|------------|------|------|-------|-----------|---------|
| Mobile-LLM-360M     | 350           | 1000             | 49.60    | 56.59      | -    | -    | -     | -         | -       |
| Qwen-2-0.5-Base     | 500           | 12000            | 49.01    | 57.69      | 27.23| 44.06| 10.61 | 22.56     | 35.19   |
| Qwen2.5-0.5B-Base   | 500           | 18000            | 52.16    | 56.82      | 24.10| 47.41| 4.77  | 29.87     | 35.86   |
| Qwen3-0.6B-Base     | 600           | 36000            | 53.77    | 59.19      | 30.80| 50.34| 15.31 | 28.04     | 39.58   |
| SmolLM-360M-Base    | 360           | 600              | 53.33    | 57.22      | 21.20| 24.92| 2.19  | 1.21      | 26.68   |
| SmolLM2-360M-Base   | 360           | 4000            | 56.30    | 59.19      | 25.22| 25.55| 2.88  | 0.00      | 28.19   |
| **Nandi-Mini-150M-Base** | **150**       | 500          | 37.20| 52.32  | 28.57 | 28.86 | 2.58 | 4.27 | 25.63 |

### Note
Mobile-LLM model checkpoints are not publicly available; their results are reported directly from the original paper. All other models have been evaluated using `lm-eval` under a consistent setup. Human-Eval & GSM8K have been evaluated using Greedy-decoding for now for all models.

## Performance onf Finetuned Tasks

#### CrossSum-Hindi (CHRF) Results
We finetuned our model and other open source models on [Google's IndicGenBench](https://github.com/google-research-datasets/indic-gen-bench/) Crossum-Hindi. Nandi-mini-150M was able to outperform other models.

| Base Model              | Before Finetune | After Finetune |
|------------------------|-----------------|----------------|
| Qwen-2-0.5-Base        | 0.09            | 4.22           |
| Qwen2.5-0.5B-Base      | 0.43            | 4.18           |
| SmolLM-135M-Base       | 0.09            | 2.55           |
| SmolLM-360M-Base       | 0.09            | 2.99           |
| SmolLM2-135M-Base      | 0.09            | 2.67           |
| SmolLM2-360M-Base      | 0.12            | 3.51           |
| Nandi-mini-150M        | 0.10            | **4.37**       |


## Tokenization Fertility Score across Languages

| Language  | SmolLM3-3B | Qwen3-0.6B-Base | Sarvam-1 | Nandi-Mini-150M |
|-----------|------------|-----------------|----------|------------------|
| English   | 1.17       | 1.16            | 1.32     | **1.18**             |
| Bengali   | 8.66       | 7.51            | 1.55     | **1.44**             |
| Gujarati  | 10.47      | 9.37            | 1.55     | **1.53**             |
| Hindi     | 2.71       | 5.14            | **1.25**     | 1.32             |
| Kannada   | 16.43      | 12.96           | 2.10     | **1.90**             |
| Malayalam | 17.77      | 14.56           | 2.49     | **2.05**             |
| Marathi   | 3.73       | 6.70            | 1.55     | **1.55**             |
| Oriya     | 19.07      | 15.75           |**2.18**     | 2.68             |
| Punjabi   | 9.23       | 8.66            | 1.47     | **1.42**             |
| Tamil     | 13.56      | 10.93           | 2.06     | **2.05**             |
| Telugu    | 15.40      | 13.38           | 2.09     | **1.77**             |
| Assamese  | 9.26       | 8.13            | 4.31     | **1.51**             |


## 🚀 Usage

```python
!pip install transformers=='5.4.0'

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "FrontiersMind/Nandi-mini-150M"

device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    dtype=torch.bfloat16
).to(device).eval()


prompt = """
The night was quiet and the streets were empty. 
A single light flickered in the distance. Someone was walking slowly, carrying a small bag. Suddenly,
"""
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

outputs = model.generate(
    **model_inputs,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.3,
    top_k=20,
    repetition_penalty=1.1,
    top_p=0.95
)

response = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True,
)

print(response)
```


## 📬 Feedback & Suggestions

We’d love to hear your thoughts, feedback, and ideas!

- **Discord**: https://discord.gg/ZGdjCdRt
- **Email:** support@frontiersmind.ai
- **Official Website** https://www.frontiersmind.ai/
- **LinkedIn:** https://www.linkedin.com/company/frontiersmind/
- **X (Twitter):** https://x.com/FrontiersMind