Instructions to use FrontiersMind/Nandi-Mini-150M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FrontiersMind/Nandi-Mini-150M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FrontiersMind/Nandi-Mini-150M", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("FrontiersMind/Nandi-Mini-150M", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use FrontiersMind/Nandi-Mini-150M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FrontiersMind/Nandi-Mini-150M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FrontiersMind/Nandi-Mini-150M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/FrontiersMind/Nandi-Mini-150M

SGLang

How to use FrontiersMind/Nandi-Mini-150M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FrontiersMind/Nandi-Mini-150M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FrontiersMind/Nandi-Mini-150M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FrontiersMind/Nandi-Mini-150M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FrontiersMind/Nandi-Mini-150M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use FrontiersMind/Nandi-Mini-150M with Docker Model Runner:
```
docker model run hf.co/FrontiersMind/Nandi-Mini-150M
```

Nandi-Mini-150M / README.md

vishesh-t27

Update README.md

31c8ace verified 7 days ago

preview code

raw

history blame contribute delete

8.08 kB

	---
	license: apache-2.0
	language:
	- en
	- hi
	- mr
	- ta
	- te
	- kn
	- ml
	- bn
	- pa
	- gu
	- or
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Nandi-Mini-150M

	## Introduction

	Nandi-Mini-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is pre-trained from scratch on 525 billion tokens and supports English and 10 Indic languages.

	We do not employ any benchmaxing tricks; the model is designed to be genuinely strong and highly effective for fine-tuning on downstream tasks.

	Nandi-Mini-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments.
	Nandi-Mini-150M brings the following key features:

	- Strong multilingual capability across English and Indic languages
	- Efficient design enabling high performance at small scale (150M parameters)
	- Reduced memory footprint using factorized embeddings
	- Better parameter efficiency through layer sharing

	## 📝 Upcoming Releases & Roadmap

	We’re just getting started with the Nandi series 🚀

	- Nandi-Mini-150M (Base) — Available now
	- Nandi-Mini-150M (Instruct) — Available now
	- Nandi-Mini-500M (Base + Instruct) — Pre-Training Going On
	- Nandi-Mini-1B (Base + Instruct) — Pre-Training Going On

	We are actively working on expanding the Nandi family to cover a wider range of use cases—from lightweight edge deployments to more capable instruction-tuned systems.

	📢 Blogs & technical deep-dives coming soon, where we’ll share:
	- Architecture decisions and design trade-offs
	- Training insights and dataset composition
	- Benchmarks and real-world applications

	Stay tuned!

	This repo contains the base Nandi-Mini-150M model, which has the following features:

	- Type: Causal Language Model
	- Training Stage: Pretraining (from scratch)
	- Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings, factorize embeddings
	- Number of Layers: 16*2 [Layer Sharing, effective layer =32]
	- Context Length: 2,048 tokens
	- Vocabulary Size: 131,072

	## 🌍 Supported Languages

	The model is trained on English and a diverse set of Indic languages, including:

	- Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia

	## Benchmark Results

	## 📊 Benchmark Comparison (~150M Class)

	\| Model Name \| Parameters \| Tokens(B) \| HellaSwag \| Winogrande \| GPQA \| MMLU \| GSM8K \| HumanEval \| Average \|
	\|------------------\|---------------\|------------------\|----------\|------------\|------\|------\|-------\|-----------\|---------\|
	\| Mobile-LLM-125M \| 125 \| 1000 \| 38.90 \| 53.10 \| - \| - \| - \| - \| - \|
	\| SmolLM-135M-Base \| 135 \| 600 \| 42.66\| 53.03 \| 25.44\| 25.30\| 1.36 \| 0.00 \| 24.63 \|
	\| SmolLM2-135M-Base\| 135 \| 2000 \| 43.13\| 53.27 \| 22.09\| 24.09\| 1.74 \| 0.00 \| 24.05 \|
	\| Nandi-Mini-150M-Base \| 150 \| 500 \| 37.20 \| 52.32 \| 28.57 \| 28.86 \| 2.58 \| 4.27 \| 25.63 \|


	## 📊 Model Benchmark Comparison With Slightly Bigger Models (350M–600M Class)

	\| Model Name \| Parameters \| Tokens(B) \| HellaSwag \| Winogrande \| GPQA \| MMLU \| GSM8K \| HumanEval \| Average \|
	\|---------------------\|---------------\|------------------\|----------\|------------\|------\|------\|-------\|-----------\|---------\|
	\| Mobile-LLM-360M \| 350 \| 1000 \| 49.60 \| 56.59 \| - \| - \| - \| - \| - \|
	\| Qwen-2-0.5-Base \| 500 \| 12000 \| 49.01 \| 57.69 \| 27.23\| 44.06\| 10.61 \| 22.56 \| 35.19 \|
	\| Qwen2.5-0.5B-Base \| 500 \| 18000 \| 52.16 \| 56.82 \| 24.10\| 47.41\| 4.77 \| 29.87 \| 35.86 \|
	\| Qwen3-0.6B-Base \| 600 \| 36000 \| 53.77 \| 59.19 \| 30.80\| 50.34\| 15.31 \| 28.04 \| 39.58 \|
	\| SmolLM-360M-Base \| 360 \| 600 \| 53.33 \| 57.22 \| 21.20\| 24.92\| 2.19 \| 1.21 \| 26.68 \|
	\| SmolLM2-360M-Base \| 360 \| 4000 \| 56.30 \| 59.19 \| 25.22\| 25.55\| 2.88 \| 0.00 \| 28.19 \|
	\| Nandi-Mini-150M-Base \| 150 \| 500 \| 37.20\| 52.32 \| 28.57 \| 28.86 \| 2.58 \| 4.27 \| 25.63 \|

	### Note
	Mobile-LLM model checkpoints are not publicly available; their results are reported directly from the original paper. All other models have been evaluated using `lm-eval` under a consistent setup. Human-Eval & GSM8K have been evaluated using Greedy-decoding for now for all models.

	## Performance onf Finetuned Tasks

	#### CrossSum-Hindi (CHRF) Results
	We finetuned our model and other open source models on [Google's IndicGenBench](https://github.com/google-research-datasets/indic-gen-bench/) Crossum-Hindi. Nandi-mini-150M was able to outperform other models.

	\| Base Model \| Before Finetune \| After Finetune \|
	\|------------------------\|-----------------\|----------------\|
	\| Qwen-2-0.5-Base \| 0.09 \| 4.22 \|
	\| Qwen2.5-0.5B-Base \| 0.43 \| 4.18 \|
	\| SmolLM-135M-Base \| 0.09 \| 2.55 \|
	\| SmolLM-360M-Base \| 0.09 \| 2.99 \|
	\| SmolLM2-135M-Base \| 0.09 \| 2.67 \|
	\| SmolLM2-360M-Base \| 0.12 \| 3.51 \|
	\| Nandi-mini-150M \| 0.10 \| 4.37 \|


	## Tokenization Fertility Score across Languages

	\| Language \| SmolLM3-3B \| Qwen3-0.6B-Base \| Sarvam-1 \| Nandi-Mini-150M \|
	\|-----------\|------------\|-----------------\|----------\|------------------\|
	\| English \| 1.17 \| 1.16 \| 1.32 \| 1.18 \|
	\| Bengali \| 8.66 \| 7.51 \| 1.55 \| 1.44 \|
	\| Gujarati \| 10.47 \| 9.37 \| 1.55 \| 1.53 \|
	\| Hindi \| 2.71 \| 5.14 \| 1.25 \| 1.32 \|
	\| Kannada \| 16.43 \| 12.96 \| 2.10 \| 1.90 \|
	\| Malayalam \| 17.77 \| 14.56 \| 2.49 \| 2.05 \|
	\| Marathi \| 3.73 \| 6.70 \| 1.55 \| 1.55 \|
	\| Oriya \| 19.07 \| 15.75 \|2.18 \| 2.68 \|
	\| Punjabi \| 9.23 \| 8.66 \| 1.47 \| 1.42 \|
	\| Tamil \| 13.56 \| 10.93 \| 2.06 \| 2.05 \|
	\| Telugu \| 15.40 \| 13.38 \| 2.09 \| 1.77 \|
	\| Assamese \| 9.26 \| 8.13 \| 4.31 \| 1.51 \|



	## 🚀 Usage

	```python
	!pip install transformers=='5.4.0'

	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_name = "FrontiersMind/Nandi-mini-150M"

	device = "cuda" if torch.cuda.is_available() else "cpu"

	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	trust_remote_code=True,
	dtype=torch.bfloat16
	).to(device).eval()


	prompt = """
	The night was quiet and the streets were empty.
	A single light flickered in the distance. Someone was walking slowly, carrying a small bag. Suddenly,
	"""
	model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

	outputs = model.generate(
	**model_inputs,
	max_new_tokens=50,
	do_sample=True,
	temperature=0.3,
	top_k=20,
	repetition_penalty=1.1,
	top_p=0.95
	)

	response = tokenizer.decode(
	outputs[0],
	skip_special_tokens=True,
	)

	print(response)
	```



	## 📬 Feedback & Suggestions

	We’d love to hear your thoughts, feedback, and ideas!

	- Discord: https://discord.gg/ZGdjCdRt
	- Email: support@frontiersmind.ai
	- Official Website https://www.frontiersmind.ai/
	- LinkedIn: https://www.linkedin.com/company/frontiersmind/
	- X (Twitter): https://x.com/FrontiersMind