vishesh-t27 commited on
Commit
4331f85
·
verified ·
1 Parent(s): 39d4144

updated Readme.md

Browse files
Files changed (1) hide show
  1. README.md +36 -10
README.md CHANGED
@@ -16,14 +16,14 @@ pipeline_tag: text-generation
16
  library_name: transformers
17
  ---
18
 
19
- # Nandi-150M
20
 
21
  ## Introduction
22
 
23
- Nandi-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is trained from scratch on **525 billion tokens** and supports **English and 10 Indic languages**.
24
 
25
- Nandi-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments.
26
- Nandi-150M brings the following key features:
27
 
28
  - Strong **multilingual capability** across English and Indic languages
29
  - Efficient design enabling **high performance at small scale (150M parameters)**
@@ -34,10 +34,10 @@ Nandi-150M brings the following key features:
34
 
35
  We’re just getting started with the Nandi series 🚀
36
 
37
- - **Nandi-150M (Base)** — *Available now*
38
- - **Nandi-150M (Instruct)** — Coming soon (open-sourced)
39
- - **Nandi-500M (Base + Instruct)** — Planned next
40
- - **Nandi-1B (Base + Instruct)** — Final milestone in the current roadmap
41
 
42
  We are actively working on expanding the Nandi family to cover a wider range of use cases—from lightweight edge deployments to more capable instruction-tuned systems.
43
 
@@ -48,7 +48,7 @@ We are actively working on expanding the Nandi family to cover a wider range of
48
 
49
  Stay tuned!
50
 
51
- **This repo contains the base Nandi-150M model**, which has the following features:
52
 
53
  - Type: Causal Language Model
54
  - Training Stage: Pretraining (from scratch)
@@ -61,7 +61,33 @@ Stay tuned!
61
 
62
  The model is trained on English and a diverse set of Indic languages, including:
63
 
64
- - Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
  ## 🚀 Usage
67
 
 
16
  library_name: transformers
17
  ---
18
 
19
+ # Nandi-Mini-150M
20
 
21
  ## Introduction
22
 
23
+ Nandi-Mini-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is trained from scratch on **525 billion tokens** and supports **English and 10 Indic languages**.
24
 
25
+ Nandi-Mini-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments.
26
+ Nandi-Mini-150M brings the following key features:
27
 
28
  - Strong **multilingual capability** across English and Indic languages
29
  - Efficient design enabling **high performance at small scale (150M parameters)**
 
34
 
35
  We’re just getting started with the Nandi series 🚀
36
 
37
+ - **Nandi-Mini-150M (Base)** — *Available now*
38
+ - **Nandi-Mini-150M (Instruct)** — Coming soon (open-sourced)
39
+ - **Nandi-Mini-500M (Base + Instruct)** — Planned next
40
+ - **Nandi-Mini-1B (Base + Instruct)** — Final milestone in the current roadmap
41
 
42
  We are actively working on expanding the Nandi family to cover a wider range of use cases—from lightweight edge deployments to more capable instruction-tuned systems.
43
 
 
48
 
49
  Stay tuned!
50
 
51
+ **This repo contains the base Nandi-Mini-150M model**, which has the following features:
52
 
53
  - Type: Causal Language Model
54
  - Training Stage: Pretraining (from scratch)
 
61
 
62
  The model is trained on English and a diverse set of Indic languages, including:
63
 
64
+ - Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
65
+
66
+ ## Benchmark Results
67
+
68
+ ## 📊 Benchmark Comparison (Nandi-150M Focus)
69
+
70
+ | Model Name | Parameters (M) | Tokens Budget (B) | HellaSwag | Winogrande | GPQA | MMLU | GSM8K | HumanEval | Average |
71
+ |------------------|---------------|------------------|----------|------------|------|------|-------|-----------|---------|
72
+ | Mobile-LLM-125M | 125 | 1000 | 38.90 | 53.10 | - | - | - | - | - |
73
+ | SmolLM-135M-Base | 135 | 600 | 42.66| 53.03 | 25.44| 25.30| 1.36 | 0.00 | 24.63 |
74
+ | SmolLM2-135M-Base| 135 | 2000 | 43.13| 53.27 | 22.09| 24.09| 1.74 | 0.00 | 24.05 |
75
+ | **Nandi-Mini-150M-Base** | **150** | **500** | 37.20 | 52.32 | **28.57** | **28.86** | **2.58** | **4.27** | **25.63** |
76
+
77
+
78
+ ## 📊 Model Benchmark Comparison With Bigger Models (350M–600M Class)
79
+
80
+ | Model Name | Parameters (M) | Tokens Budget (B) | HellaSwag | Winogrande | GPQA | MMLU | GSM8K | HumanEval | Average |
81
+ |---------------------|---------------|------------------|----------|------------|------|------|-------|-----------|---------|
82
+ | Mobile-LLM-360M | 350 | 1000 | 49.60 | 56.59 | - | - | - | - | - |
83
+ | Qwen-2-0.5-Base | 500 | 12000 | 49.01 | 57.69 | 27.23| 44.06| 10.61 | 22.56 | 35.19 |
84
+ | Qwen2.5-0.5B-Base | 500 | 18000 | 52.16 | 56.82 | 24.10| 47.41| 4.77 | 29.87 | 35.86 |
85
+ | Qwen3-0.6B-Base | 600 | 36000 | 53.77 | 59.19 | 30.80| 50.34| 15.31 | 28.04 | 39.58 |
86
+ | SmolLM-360M-Base | 360 | 600 | 53.33 | 57.22 | 21.20| 24.92| 2.19 | 1.21 | 26.68 |
87
+ | SmolLM2-360M-Base | 360 | 40000 | 56.30 | 59.19 | 25.22| 25.55| 2.88 | 0.00 | 28.19 |
88
+ | **Nandi-Mini-150M-Base** | **150** | 500 | 37.20| 52.32 | 28.57 | 28.86 | 2.58 | 4.27 | 25.63 |
89
+
90
+
91
 
92
  ## 🚀 Usage
93