--- license: apache-2.0 language: - en - hi - mr - ta - te - kn - ml - bn - pa - gu - or pipeline_tag: text-generation library_name: transformers --- # Nandi-Mini-150M ## Introduction Nandi-Mini-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is pre-trained from scratch on **525 billion tokens** and supports **English and 10 Indic languages**. We do not employ any benchmaxing tricks; the model is designed to be genuinely strong and highly effective for fine-tuning on downstream tasks. Nandi-Mini-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments. Nandi-Mini-150M brings the following key features: - Strong **multilingual capability** across English and Indic languages - Efficient design enabling **high performance at small scale (150M parameters)** - Reduced memory footprint using **factorized embeddings** - Better parameter efficiency through **layer sharing** ## πŸ“ Upcoming Releases & Roadmap We’re just getting started with the Nandi series πŸš€ - **Nandi-Mini-150M (Base)** β€” *Available now* - **Nandi-Mini-150M (Instruct)** β€” *Available now* - **Nandi-Mini-500M (Base + Instruct)** β€” Pre-Training Going On - **Nandi-Mini-1B (Base + Instruct)** β€” Pre-Training Going On We are actively working on expanding the Nandi family to cover a wider range of use casesβ€”from lightweight edge deployments to more capable instruction-tuned systems. πŸ“’ **Blogs & technical deep-dives coming soon**, where we’ll share: - Architecture decisions and design trade-offs - Training insights and dataset composition - Benchmarks and real-world applications Stay tuned! **This repo contains the base Nandi-Mini-150M model**, which has the following features: - Type: Causal Language Model - Training Stage: Pretraining (from scratch) - Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings, **factorize embeddings** - Number of Layers: 16*2 [Layer Sharing, effective layer =32] - Context Length: 2,048 tokens - Vocabulary Size: 131,072 ## 🌍 Supported Languages The model is trained on English and a diverse set of Indic languages, including: - Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia ## Benchmark Results ## πŸ“Š Benchmark Comparison (~150M Class) | Model Name | Parameters | Tokens(B) | HellaSwag | Winogrande | GPQA | MMLU | GSM8K | HumanEval | Average | |------------------|---------------|------------------|----------|------------|------|------|-------|-----------|---------| | Mobile-LLM-125M | 125 | 1000 | 38.90 | 53.10 | - | - | - | - | - | | SmolLM-135M-Base | 135 | 600 | 42.66| 53.03 | 25.44| 25.30| 1.36 | 0.00 | 24.63 | | SmolLM2-135M-Base| 135 | 2000 | 43.13| 53.27 | 22.09| 24.09| 1.74 | 0.00 | 24.05 | | **Nandi-Mini-150M-Base** | **150** | **500** | 37.20 | 52.32 | **28.57** | **28.86** | **2.58** | **4.27** | **25.63** | ## πŸ“Š Model Benchmark Comparison With Slightly Bigger Models (350M–600M Class) | Model Name | Parameters | Tokens(B) | HellaSwag | Winogrande | GPQA | MMLU | GSM8K | HumanEval | Average | |---------------------|---------------|------------------|----------|------------|------|------|-------|-----------|---------| | Mobile-LLM-360M | 350 | 1000 | 49.60 | 56.59 | - | - | - | - | - | | Qwen-2-0.5-Base | 500 | 12000 | 49.01 | 57.69 | 27.23| 44.06| 10.61 | 22.56 | 35.19 | | Qwen2.5-0.5B-Base | 500 | 18000 | 52.16 | 56.82 | 24.10| 47.41| 4.77 | 29.87 | 35.86 | | Qwen3-0.6B-Base | 600 | 36000 | 53.77 | 59.19 | 30.80| 50.34| 15.31 | 28.04 | 39.58 | | SmolLM-360M-Base | 360 | 600 | 53.33 | 57.22 | 21.20| 24.92| 2.19 | 1.21 | 26.68 | | SmolLM2-360M-Base | 360 | 4000 | 56.30 | 59.19 | 25.22| 25.55| 2.88 | 0.00 | 28.19 | | **Nandi-Mini-150M-Base** | **150** | 500 | 37.20| 52.32 | 28.57 | 28.86 | 2.58 | 4.27 | 25.63 | ### Note Mobile-LLM model checkpoints are not publicly available; their results are reported directly from the original paper. All other models have been evaluated using `lm-eval` under a consistent setup. Human-Eval & GSM8K have been evaluated using Greedy-decoding for now for all models. ## Performance onf Finetuned Tasks #### CrossSum-Hindi (CHRF) Results We finetuned our model and other open source models on [Google's IndicGenBench](https://github.com/google-research-datasets/indic-gen-bench/) Crossum-Hindi. Nandi-mini-150M was able to outperform other models. | Base Model | Before Finetune | After Finetune | |------------------------|-----------------|----------------| | Qwen-2-0.5-Base | 0.09 | 4.22 | | Qwen2.5-0.5B-Base | 0.43 | 4.18 | | SmolLM-135M-Base | 0.09 | 2.55 | | SmolLM-360M-Base | 0.09 | 2.99 | | SmolLM2-135M-Base | 0.09 | 2.67 | | SmolLM2-360M-Base | 0.12 | 3.51 | | Nandi-mini-150M | 0.10 | **4.37** | ## Tokenization Fertility Score across Languages | Language | SmolLM3-3B | Qwen3-0.6B-Base | Sarvam-1 | Nandi-Mini-150M | |-----------|------------|-----------------|----------|------------------| | English | 1.17 | 1.16 | 1.32 | **1.18** | | Bengali | 8.66 | 7.51 | 1.55 | **1.44** | | Gujarati | 10.47 | 9.37 | 1.55 | **1.53** | | Hindi | 2.71 | 5.14 | **1.25** | 1.32 | | Kannada | 16.43 | 12.96 | 2.10 | **1.90** | | Malayalam | 17.77 | 14.56 | 2.49 | **2.05** | | Marathi | 3.73 | 6.70 | 1.55 | **1.55** | | Oriya | 19.07 | 15.75 |**2.18** | 2.68 | | Punjabi | 9.23 | 8.66 | 1.47 | **1.42** | | Tamil | 13.56 | 10.93 | 2.06 | **2.05** | | Telugu | 15.40 | 13.38 | 2.09 | **1.77** | | Assamese | 9.26 | 8.13 | 4.31 | **1.51** | ## πŸš€ Usage ```python !pip install transformers=='5.4.0' from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "FrontiersMind/Nandi-mini-150M" device = "cuda" if torch.cuda.is_available() else "cpu" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True, dtype=torch.bfloat16 ).to(device).eval() prompt = """ The night was quiet and the streets were empty. A single light flickered in the distance. Someone was walking slowly, carrying a small bag. Suddenly, """ model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device) outputs = model.generate( **model_inputs, max_new_tokens=50, do_sample=True, temperature=0.3, top_k=20, repetition_penalty=1.1, top_p=0.95 ) response = tokenizer.decode( outputs[0], skip_special_tokens=True, ) print(response) ``` ## πŸ“¬ Feedback & Suggestions We’d love to hear your thoughts, feedback, and ideas! - **Discord**: https://discord.gg/ZGdjCdRt - **Email:** support@frontiersmind.ai - **Official Website** https://www.frontiersmind.ai/ - **LinkedIn:** https://www.linkedin.com/company/frontiersmind/ - **X (Twitter):** https://x.com/FrontiersMind