Instructions to use FrontiersMind/Nandi-Mini-150M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FrontiersMind/Nandi-Mini-150M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FrontiersMind/Nandi-Mini-150M", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("FrontiersMind/Nandi-Mini-150M", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use FrontiersMind/Nandi-Mini-150M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FrontiersMind/Nandi-Mini-150M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrontiersMind/Nandi-Mini-150M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/FrontiersMind/Nandi-Mini-150M
- SGLang
How to use FrontiersMind/Nandi-Mini-150M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FrontiersMind/Nandi-Mini-150M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrontiersMind/Nandi-Mini-150M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FrontiersMind/Nandi-Mini-150M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrontiersMind/Nandi-Mini-150M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use FrontiersMind/Nandi-Mini-150M with Docker Model Runner:
docker model run hf.co/FrontiersMind/Nandi-Mini-150M
| license: apache-2.0 | |
| language: | |
| - en | |
| - hi | |
| - mr | |
| - ta | |
| - te | |
| - kn | |
| - ml | |
| - bn | |
| - pa | |
| - gu | |
| - or | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| # Nandi-Mini-150M | |
| ## Introduction | |
| Nandi-Mini-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is pre-trained from scratch on **525 billion tokens** and supports **English and 10 Indic languages**. | |
| We do not employ any benchmaxing tricks; the model is designed to be genuinely strong and highly effective for fine-tuning on downstream tasks. | |
| Nandi-Mini-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments. | |
| Nandi-Mini-150M brings the following key features: | |
| - Strong **multilingual capability** across English and Indic languages | |
| - Efficient design enabling **high performance at small scale (150M parameters)** | |
| - Reduced memory footprint using **factorized embeddings** | |
| - Better parameter efficiency through **layer sharing** | |
| ## ๐ Upcoming Releases & Roadmap | |
| Weโre just getting started with the Nandi series ๐ | |
| - **Nandi-Mini-150M (Base)** โ *Available now* | |
| - **Nandi-Mini-150M (Instruct)** โ *Available now* | |
| - **Nandi-Mini-500M (Base + Instruct)** โ Pre-Training Going On | |
| - **Nandi-Mini-1B (Base + Instruct)** โ Pre-Training Going On | |
| We are actively working on expanding the Nandi family to cover a wider range of use casesโfrom lightweight edge deployments to more capable instruction-tuned systems. | |
| ๐ข **Blogs & technical deep-dives coming soon**, where weโll share: | |
| - Architecture decisions and design trade-offs | |
| - Training insights and dataset composition | |
| - Benchmarks and real-world applications | |
| Stay tuned! | |
| **This repo contains the base Nandi-Mini-150M model**, which has the following features: | |
| - Type: Causal Language Model | |
| - Training Stage: Pretraining (from scratch) | |
| - Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings, **factorize embeddings** | |
| - Number of Layers: 16*2 [Layer Sharing, effective layer =32] | |
| - Context Length: 2,048 tokens | |
| - Vocabulary Size: 131,072 | |
| ## ๐ Supported Languages | |
| The model is trained on English and a diverse set of Indic languages, including: | |
| - Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia | |
| ## Benchmark Results | |
| ## ๐ Benchmark Comparison (~150M Class) | |
| | Model Name | Parameters | Tokens(B) | HellaSwag | Winogrande | GPQA | MMLU | GSM8K | HumanEval | Average | | |
| |------------------|---------------|------------------|----------|------------|------|------|-------|-----------|---------| | |
| | Mobile-LLM-125M | 125 | 1000 | 38.90 | 53.10 | - | - | - | - | - | | |
| | SmolLM-135M-Base | 135 | 600 | 42.66| 53.03 | 25.44| 25.30| 1.36 | 0.00 | 24.63 | | |
| | SmolLM2-135M-Base| 135 | 2000 | 43.13| 53.27 | 22.09| 24.09| 1.74 | 0.00 | 24.05 | | |
| | **Nandi-Mini-150M-Base** | **150** | **500** | 37.20 | 52.32 | **28.57** | **28.86** | **2.58** | **4.27** | **25.63** | | |
| ## ๐ Model Benchmark Comparison With Slightly Bigger Models (350Mโ600M Class) | |
| | Model Name | Parameters | Tokens(B) | HellaSwag | Winogrande | GPQA | MMLU | GSM8K | HumanEval | Average | | |
| |---------------------|---------------|------------------|----------|------------|------|------|-------|-----------|---------| | |
| | Mobile-LLM-360M | 350 | 1000 | 49.60 | 56.59 | - | - | - | - | - | | |
| | Qwen-2-0.5-Base | 500 | 12000 | 49.01 | 57.69 | 27.23| 44.06| 10.61 | 22.56 | 35.19 | | |
| | Qwen2.5-0.5B-Base | 500 | 18000 | 52.16 | 56.82 | 24.10| 47.41| 4.77 | 29.87 | 35.86 | | |
| | Qwen3-0.6B-Base | 600 | 36000 | 53.77 | 59.19 | 30.80| 50.34| 15.31 | 28.04 | 39.58 | | |
| | SmolLM-360M-Base | 360 | 600 | 53.33 | 57.22 | 21.20| 24.92| 2.19 | 1.21 | 26.68 | | |
| | SmolLM2-360M-Base | 360 | 4000 | 56.30 | 59.19 | 25.22| 25.55| 2.88 | 0.00 | 28.19 | | |
| | **Nandi-Mini-150M-Base** | **150** | 500 | 37.20| 52.32 | 28.57 | 28.86 | 2.58 | 4.27 | 25.63 | | |
| ### Note | |
| Mobile-LLM model checkpoints are not publicly available; their results are reported directly from the original paper. All other models have been evaluated using `lm-eval` under a consistent setup. Human-Eval & GSM8K have been evaluated using Greedy-decoding for now for all models. | |
| ## Performance onf Finetuned Tasks | |
| #### CrossSum-Hindi (CHRF) Results | |
| We finetuned our model and other open source models on [Google's IndicGenBench](https://github.com/google-research-datasets/indic-gen-bench/) Crossum-Hindi. Nandi-mini-150M was able to outperform other models. | |
| | Base Model | Before Finetune | After Finetune | | |
| |------------------------|-----------------|----------------| | |
| | Qwen-2-0.5-Base | 0.09 | 4.22 | | |
| | Qwen2.5-0.5B-Base | 0.43 | 4.18 | | |
| | SmolLM-135M-Base | 0.09 | 2.55 | | |
| | SmolLM-360M-Base | 0.09 | 2.99 | | |
| | SmolLM2-135M-Base | 0.09 | 2.67 | | |
| | SmolLM2-360M-Base | 0.12 | 3.51 | | |
| | Nandi-mini-150M | 0.10 | **4.37** | | |
| ## Tokenization Fertility Score across Languages | |
| | Language | SmolLM3-3B | Qwen3-0.6B-Base | Sarvam-1 | Nandi-Mini-150M | | |
| |-----------|------------|-----------------|----------|------------------| | |
| | English | 1.17 | 1.16 | 1.32 | **1.18** | | |
| | Bengali | 8.66 | 7.51 | 1.55 | **1.44** | | |
| | Gujarati | 10.47 | 9.37 | 1.55 | **1.53** | | |
| | Hindi | 2.71 | 5.14 | **1.25** | 1.32 | | |
| | Kannada | 16.43 | 12.96 | 2.10 | **1.90** | | |
| | Malayalam | 17.77 | 14.56 | 2.49 | **2.05** | | |
| | Marathi | 3.73 | 6.70 | 1.55 | **1.55** | | |
| | Oriya | 19.07 | 15.75 |**2.18** | 2.68 | | |
| | Punjabi | 9.23 | 8.66 | 1.47 | **1.42** | | |
| | Tamil | 13.56 | 10.93 | 2.06 | **2.05** | | |
| | Telugu | 15.40 | 13.38 | 2.09 | **1.77** | | |
| | Assamese | 9.26 | 8.13 | 4.31 | **1.51** | | |
| ## ๐ Usage | |
| ```python | |
| !pip install transformers=='5.4.0' | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| model_name = "FrontiersMind/Nandi-mini-150M" | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| trust_remote_code=True, | |
| dtype=torch.bfloat16 | |
| ).to(device).eval() | |
| prompt = """ | |
| The night was quiet and the streets were empty. | |
| A single light flickered in the distance. Someone was walking slowly, carrying a small bag. Suddenly, | |
| """ | |
| model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device) | |
| outputs = model.generate( | |
| **model_inputs, | |
| max_new_tokens=50, | |
| do_sample=True, | |
| temperature=0.3, | |
| top_k=20, | |
| repetition_penalty=1.1, | |
| top_p=0.95 | |
| ) | |
| response = tokenizer.decode( | |
| outputs[0], | |
| skip_special_tokens=True, | |
| ) | |
| print(response) | |
| ``` | |
| ## ๐ฌ Feedback & Suggestions | |
| Weโd love to hear your thoughts, feedback, and ideas! | |
| - **Discord**: https://discord.gg/ZGdjCdRt | |
| - **Email:** support@frontiersmind.ai | |
| - **Official Website** https://www.frontiersmind.ai/ | |
| - **LinkedIn:** https://www.linkedin.com/company/frontiersmind/ | |
| - **X (Twitter):** https://x.com/FrontiersMind | |