Instructions to use FrontiersMind/Nandi-Mini-600M-Early-Checkpoint with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FrontiersMind/Nandi-Mini-600M-Early-Checkpoint with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FrontiersMind/Nandi-Mini-600M-Early-Checkpoint", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("FrontiersMind/Nandi-Mini-600M-Early-Checkpoint", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use FrontiersMind/Nandi-Mini-600M-Early-Checkpoint with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/FrontiersMind/Nandi-Mini-600M-Early-Checkpoint
- SGLang
How to use FrontiersMind/Nandi-Mini-600M-Early-Checkpoint with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use FrontiersMind/Nandi-Mini-600M-Early-Checkpoint with Docker Model Runner:
docker model run hf.co/FrontiersMind/Nandi-Mini-600M-Early-Checkpoint
Update README.md
Browse files
README.md
CHANGED
|
@@ -49,24 +49,6 @@ Stay tuned!
|
|
| 49 |
|
| 50 |
---
|
| 51 |
|
| 52 |
-
## Model Overview
|
| 53 |
-
|
| 54 |
-
**Repository:** `FrontiersMind/Nandi-mini-500M-Early-Checkpoint`
|
| 55 |
-
|
| 56 |
-
### Model Details
|
| 57 |
-
|
| 58 |
-
- Type: Causal Language Model
|
| 59 |
-
- Training Stage: Early Pretraining Checkpoint
|
| 60 |
-
- Parameters: ~500M
|
| 61 |
-
- Architecture: Transformer decoder
|
| 62 |
-
- Positional Encoding: RoPE
|
| 63 |
-
- Normalization: RMSNorm + QK Norm
|
| 64 |
-
- Activation: SwiGLU
|
| 65 |
-
- Attention: GQA + Shared KV
|
| 66 |
-
- Embeddings: Tied embeddings with factorized design
|
| 67 |
-
- Context Length: 2,048 tokens
|
| 68 |
-
- Vocabulary Size: 131,072
|
| 69 |
-
|
| 70 |
|
| 71 |
### Architectural Highlights
|
| 72 |
|
|
@@ -112,20 +94,21 @@ This remains an active research area within the Nandi model family, and we plan
|
|
| 112 |
|
| 113 |
---
|
| 114 |
|
| 115 |
-
## 🌍 Supported Languages
|
| 116 |
|
| 117 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
-
- Hindi
|
| 120 |
-
- Bengali
|
| 121 |
-
- Tamil
|
| 122 |
-
- Telugu
|
| 123 |
-
- Marathi
|
| 124 |
-
- Gujarati
|
| 125 |
-
- Kannada
|
| 126 |
-
- Malayalam
|
| 127 |
-
- Punjabi
|
| 128 |
-
- Odia
|
| 129 |
|
| 130 |
---
|
| 131 |
|
|
@@ -133,7 +116,7 @@ The model is trained on English and multiple Indic languages, including:
|
|
| 133 |
|
| 134 |
## General Benchmarks
|
| 135 |
|
| 136 |
-
| Model |
|
| 137 |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 138 |
| MobiLlama-0.5B-Base | 1.3 | 39.65 | 53.67 | 30.60 | 70.35 | 24.33 | 52.82 | 23.63 | 24.18 | 39.90 |
|
| 139 |
| Qwen-2-0.5B-Base | 12 | 49.01 | 57.69 | 33.20 | 68.98 | 27.23 | 54.79 | 25.42 | 44.06 | 45.05 |
|
|
@@ -142,7 +125,7 @@ The model is trained on English and multiple Indic languages, including:
|
|
| 142 |
| Qwen3.5-0.8B-Base | 36 | 54.87 | 60.54 | 35.80 | 70.02 | 31.25 | 70.50 | 38.23 | 52.73 | 51.74 |
|
| 143 |
| SmolLM-360M-Base | 0.6 | 53.33 | 57.22 | 37.60 | 70.56 | 21.20 | 70.24 | 33.27 | 24.92 | 46.04 |
|
| 144 |
| SmolLM2-360M-Base | 4 | 56.30 | 59.19 | 37.60 | 71.81 | 25.22 | 67.88 | 36.68 | 25.55 | 47.53 |
|
| 145 |
-
| **Nandi-Mini-
|
| 146 |
|
| 147 |
|
| 148 |
---
|
|
@@ -164,20 +147,14 @@ The model is trained on English and multiple Indic languages, including:
|
|
| 164 |
| Telugu | 15.40 | 13.38 | 2.09 | **1.77** |
|
| 165 |
| Assamese | 9.26 | 8.13 | 4.31 | **1.51** |
|
| 166 |
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
Lower fertility scores indicate more efficient tokenization, meaning fewer tokens are needed to represent text in a language.
|
| 170 |
|
| 171 |
-
This leads to:
|
| 172 |
|
| 173 |
-
|
| 174 |
-
- Lower inference cost
|
| 175 |
-
- Reduced latency
|
| 176 |
-
- Improved multilingual efficiency
|
| 177 |
|
| 178 |
-
|
| 179 |
|
| 180 |
-
-
|
| 181 |
|
| 182 |
# 🚀 Usage
|
| 183 |
|
|
|
|
| 49 |
|
| 50 |
---
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
### Architectural Highlights
|
| 54 |
|
|
|
|
| 94 |
|
| 95 |
---
|
| 96 |
|
|
|
|
| 97 |
|
| 98 |
+
### Model Details
|
| 99 |
+
|
| 100 |
+
- Type: Causal Language Model
|
| 101 |
+
- Training Stage: Early Pretraining Checkpoint
|
| 102 |
+
- Parameters: ~500M
|
| 103 |
+
- Architecture: Transformer decoder
|
| 104 |
+
- Positional Encoding: RoPE
|
| 105 |
+
- Normalization: RMSNorm + QK Norm
|
| 106 |
+
- Activation: SwiGLU
|
| 107 |
+
- Attention: GQA + Shared KV
|
| 108 |
+
- Embeddings: Tied embeddings with factorized design
|
| 109 |
+
- Context Length: 2,048 tokens
|
| 110 |
+
- Vocabulary Size: 131,072
|
| 111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
|
| 113 |
---
|
| 114 |
|
|
|
|
| 116 |
|
| 117 |
## General Benchmarks
|
| 118 |
|
| 119 |
+
| Model | Trained Tokens | HellaSwag | WinoGrande | OBQA | PIQA | GPQA | ARC-e | ARC-c | MMLU | Average |
|
| 120 |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 121 |
| MobiLlama-0.5B-Base | 1.3 | 39.65 | 53.67 | 30.60 | 70.35 | 24.33 | 52.82 | 23.63 | 24.18 | 39.90 |
|
| 122 |
| Qwen-2-0.5B-Base | 12 | 49.01 | 57.69 | 33.20 | 68.98 | 27.23 | 54.79 | 25.42 | 44.06 | 45.05 |
|
|
|
|
| 125 |
| Qwen3.5-0.8B-Base | 36 | 54.87 | 60.54 | 35.80 | 70.02 | 31.25 | 70.50 | 38.23 | 52.73 | 51.74 |
|
| 126 |
| SmolLM-360M-Base | 0.6 | 53.33 | 57.22 | 37.60 | 70.56 | 21.20 | 70.24 | 33.27 | 24.92 | 46.04 |
|
| 127 |
| SmolLM2-360M-Base | 4 | 56.30 | 59.19 | 37.60 | 71.81 | 25.22 | 67.88 | 36.68 | 25.55 | 47.53 |
|
| 128 |
+
| **Nandi-Mini-600M-Early-Checkpoint-Base** | **0.2** | 44.86 | 54.77 | 34.80 | 68.60 | 26.33 | 64.73 | 29.70 | 29.01 | 44.10 |
|
| 129 |
|
| 130 |
|
| 131 |
---
|
|
|
|
| 147 |
| Telugu | 15.40 | 13.38 | 2.09 | **1.77** |
|
| 148 |
| Assamese | 9.26 | 8.13 | 4.31 | **1.51** |
|
| 149 |
|
| 150 |
+
---
|
|
|
|
|
|
|
| 151 |
|
|
|
|
| 152 |
|
| 153 |
+
## 🌍 Supported Languages
|
|
|
|
|
|
|
|
|
|
| 154 |
|
| 155 |
+
The model is trained on English and a diverse set of Indic languages, including:
|
| 156 |
|
| 157 |
+
- Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
|
| 158 |
|
| 159 |
# 🚀 Usage
|
| 160 |
|