Instructions to use FrontiersMind/Nandi-Mini-600M-Early-Checkpoint with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FrontiersMind/Nandi-Mini-600M-Early-Checkpoint with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FrontiersMind/Nandi-Mini-600M-Early-Checkpoint", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("FrontiersMind/Nandi-Mini-600M-Early-Checkpoint", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use FrontiersMind/Nandi-Mini-600M-Early-Checkpoint with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/FrontiersMind/Nandi-Mini-600M-Early-Checkpoint

SGLang

How to use FrontiersMind/Nandi-Mini-600M-Early-Checkpoint with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use FrontiersMind/Nandi-Mini-600M-Early-Checkpoint with Docker Model Runner:
```
docker model run hf.co/FrontiersMind/Nandi-Mini-600M-Early-Checkpoint
```

vishesh-t27 commited on 9 days ago

Commit

ac5d535

verified ·

1 Parent(s): 0e55d2a

updated Readme.md

Browse files

Files changed (1) hide show

README.md +210 -1

README.md CHANGED Viewed

@@ -1,3 +1,212 @@
 ---
-license: mit
 ---

 ---
+license: apache-2.0
+language:
+- en
+- hi
+- mr
+- ta
+- te
+- kn
+- ml
+- bn
+- pa
+- gu
+- or
+pipeline_tag: text-generation
+library_name: transformers
 ---
+# Nandi-Mini-500M-Early-Checkpoint
+## Introduction
+Nandi-Mini-500M-Early-Checkpoint is an early-stage checkpoint from the upcoming **Nandi-Mini-500M** model family — a compact multilingual language model focused on strong efficiency, deployment flexibility, and Indic language support.
+The model is being trained completely from scratch and is designed to deliver strong performance at low compute and memory budgets. This checkpoint is shared to provide an early look into the model’s scaling behavior and training progress.
+Unlike many small-scale models optimized primarily for benchmark performance, Nandi-Mini is being built with practical downstream usability in mind — including fine-tuning, edge deployment, and enterprise inference workloads.
+The broader Nandi family focuses on:
+- Efficient multilingual modeling across English and Indic languages
+- High performance per parameter
+- Edge and on-prem deployment readiness
+- Low-latency inference
+- Strong tokenizer efficiency for Indic scripts
+This release is an **early checkpoint** and not the final converged model. Performance is expected to improve further with continued training and scaling.
+📢 We will soon share detailed technical blogs covering:
+- Architecture design choices
+- Training setup and scaling insights
+- Tokenization strategy
+- Dataset composition
+- Benchmark evaluations
+- Deployment optimizations
+Stay tuned!
+---
+## Model Overview
+**Repository:** `FrontiersMind/Nandi-mini-500M-Early-Checkpoint`
+### Model Details
+- Type: Causal Language Model
+- Training Stage: Early Pretraining Checkpoint
+- Parameters: ~500M
+- Architecture: Transformer decoder
+- Positional Encoding: RoPE
+- Normalization: RMSNorm + QK Norm
+- Activation: SwiGLU
+- Attention: GQA + Shared KV
+- Embeddings: Tied embeddings with factorized design
+- Context Length: 2,048 tokens
+- Vocabulary Size: 131,072
+### Architectural Highlights
+Nandi-Mini-500M introduces several efficiency-focused architectural optimizations designed for compact yet capable language models.
+#### Shared KV (Shared Key-Value Vectors)
+One of the core ideas explored in Nandi-Mini is **Shared KV**, an efficient attention mechanism where Key and Value representations partially share learned vector space representations across attention computation.
+This approach is designed to:
+- Reduce memory overhead during inference
+- Improve parameter efficiency
+- Lower KV-cache footprint for long-context generation
+- Enable faster deployment on resource-constrained hardware
+- Maintain strong quality despite smaller compute budgets
+Shared KV is part of our broader effort toward building deployable foundation models optimized for:
+- Edge devices
+- On-premise AI systems
+- Low-latency enterprise inference
+- Efficient multilingual serving
+This is still an active research and optimization area within the Nandi model family, and we plan to share deeper technical details in upcoming engineering blogs.
+---
+## 🌍 Supported Languages
+The model is trained on English and multiple Indic languages, including:
+- Hindi
+- Bengali
+- Tamil
+- Telugu
+- Marathi
+- Gujarati
+- Kannada
+- Malayalam
+- Punjabi
+- Odia
+---
+# 📊 Benchmark Results
+## General Benchmarks
+| Model | Budget (T Tokens) | HellaSwag | WinoGrande | OBQA | PIQA | GPQA | ARC-e | ARC-c | MMLU | Average |
+|---|---|---|---|---|---|---|---|---|---|---|
+| MobiLlama-0.5B-Base | 1.3 | 39.65 | 53.67 | 30.60 | 70.35 | 24.33 | 52.82 | 23.63 | 24.18 | 39.90 |
+| Qwen-2-0.5B-Base | 12 | 49.01 | 57.69 | 33.20 | 68.98 | 27.23 | 54.79 | 25.42 | 44.06 | 45.05 |
+| Qwen2.5-0.5B-Base | 18 | 52.16 | 56.82 | 35.40 | 70.29 | 24.10 | 64.64 | 29.86 | 47.41 | 47.59 |
+| Qwen3-0.6B-Base | 36 | 53.77 | 59.19 | 34.40 | 70.29 | 30.80 | 65.44 | 33.78 | 50.34 | 49.75 |
+| Qwen3.5-0.8B-Base | 36 | 54.87 | 60.54 | 35.80 | 70.02 | 31.25 | 70.50 | 38.23 | 52.73 | 51.74 |
+| SmolLM-360M-Base | 0.6 | 53.33 | 57.22 | 37.60 | 70.56 | 21.20 | 70.24 | 33.27 | 24.92 | 46.04 |
+| SmolLM2-360M-Base | 4 | 56.30 | 59.19 | 37.60 | 71.81 | 25.22 | 67.88 | 36.68 | 25.55 | 47.53 |
+| **Nandi-Mini-500M-Early-Checkpoint** | **0.5** | **44.86** | **54.77** | **34.80** | **68.60** | **26.33** | **64.73** | **29.70** | **29.01** | **44.10** |
+---
+## Tokenization Fertility Score Across Languages
+| Language  | SmolLM3-3B | Qwen3-0.6B-Base | Sarvam-1 | Nandi-Mini-500M |
+|-----------|------------|-----------------|----------|------------------|
+| English   | 1.17 | 1.16 | 1.32 | **1.18** |
+| Bengali   | 8.66 | 7.51 | 1.55 | **1.44** |
+| Gujarati  | 10.47 | 9.37 | 1.55 | **1.53** |
+| Hindi     | 2.71 | 5.14 | **1.25** | 1.32 |
+| Kannada   | 16.43 | 12.96 | 2.10 | **1.90** |
+| Malayalam | 17.77 | 14.56 | 2.49 | **2.05** |
+| Marathi   | 3.73 | 6.70 | 1.55 | **1.55** |
+| Oriya     | 19.07 | 15.75 | **2.18** | 2.68 |
+| Punjabi   | 9.23 | 8.66 | 1.47 | **1.42** |
+| Tamil     | 13.56 | 10.93 | 2.06 | **2.05** |
+| Telugu    | 15.40 | 13.38 | 2.09 | **1.77** |
+| Assamese  | 9.26 | 8.13 | 4.31 | **1.51** |
+### Why Fertility Matters
+Lower fertility scores indicate more efficient tokenization, meaning fewer tokens are needed to represent text in a language.
+This leads to:
+- Better context utilization
+- Lower inference cost
+- Reduced latency
+- Improved multilingual efficiency
+Nandi-Mini’s tokenizer is heavily optimized for Indic languages and demonstrates strong compression efficiency across several scripts.
+---
+# 🚀 Usage
+```python
+!pip install transformers
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_name = "FrontiersMind/Nandi-mini-500M-Early-Checkpoint"
+tokenizer = AutoTokenizer.from_pretrained(
+    model_name,
+    trust_remote_code=True
+)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    trust_remote_code=True,
+    device_map="auto",
+    torch_dtype=torch.bfloat16
+).eval()
+prompt = """
+The night was quiet and the streets were empty.
+A single light flickered in the distance.
+Someone was walking slowly, carrying a small bag. Suddenly,
+"""
+model_inputs = tokenizer(
+    [prompt],
+    return_tensors="pt"
+).to(model.device)
+outputs = model.generate(
+    **model_inputs,
+    max_new_tokens=64,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.95,
+    repetition_penalty=1.1
+)
+response = tokenizer.decode(
+    outputs[0],
+    skip_special_tokens=True
+)
+print(response)