vishesh-t27 commited on
Commit
39d4144
·
verified ·
1 Parent(s): aa593c0

updated readme.md

Browse files
Files changed (1) hide show
  1. README.md +26 -26
README.md CHANGED
@@ -20,10 +20,9 @@ library_name: transformers
20
 
21
  ## Introduction
22
 
23
- Nandi-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is trained from scratch on **475 billion tokens** and supports **English and 10 Indic languages**.
24
-
25
- Nandi-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is well-suited for **on-prem deployments**, **low-latency applications**, and **edge use cases**.
26
 
 
27
  Nandi-150M brings the following key features:
28
 
29
  - Strong **multilingual capability** across English and Indic languages
@@ -31,18 +30,36 @@ Nandi-150M brings the following key features:
31
  - Reduced memory footprint using **factorized embeddings**
32
  - Better parameter efficiency through **layer sharing**
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  **This repo contains the base Nandi-150M model**, which has the following features:
35
 
36
  - Type: Causal Language Model
37
  - Training Stage: Pretraining (from scratch)
38
- - Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings
39
- - Number of Layers: 16*2
40
  - Context Length: 2,048 tokens
41
  - Vocabulary Size: 131,072
42
 
43
  ## 🌍 Supported Languages
44
 
45
- The model is trained on English and a diverse set of Indic languages, including (but not limited to):
46
 
47
  - Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
48
 
@@ -71,7 +88,8 @@ outputs = model.generate(
71
  do_sample=True,
72
  temperature=0.3,
73
  top_k=20,
74
- top_p=0.95
 
75
  )
76
 
77
  response = tokenizer.decode(
@@ -80,22 +98,4 @@ response = tokenizer.decode(
80
  )
81
 
82
  print(response)
83
- ```
84
-
85
- ## 📝 Upcoming Releases & Roadmap
86
-
87
- We’re just getting started with the Nandi series 🚀
88
-
89
- - **Nandi-150M (Base)** — *Available now*
90
- - **Nandi-150M (Instruct)** — Coming soon (open-sourced)
91
- - **Nandi-500M (Base + Instruct)** — Planned next
92
- - **Nandi-1B (Base + Instruct)** — Final milestone in the current roadmap
93
-
94
- We are actively working on expanding the Nandi family to cover a wider range of use cases—from lightweight edge deployments to more capable instruction-tuned systems.
95
-
96
- 📢 **Blogs & technical deep-dives coming soon**, where we’ll share:
97
- - Architecture decisions and design trade-offs
98
- - Training insights and dataset composition
99
- - Benchmarks and real-world applications
100
-
101
- Stay tuned!
 
20
 
21
  ## Introduction
22
 
23
+ Nandi-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is trained from scratch on **525 billion tokens** and supports **English and 10 Indic languages**.
 
 
24
 
25
+ Nandi-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments.
26
  Nandi-150M brings the following key features:
27
 
28
  - Strong **multilingual capability** across English and Indic languages
 
30
  - Reduced memory footprint using **factorized embeddings**
31
  - Better parameter efficiency through **layer sharing**
32
 
33
+ ## 📝 Upcoming Releases & Roadmap
34
+
35
+ We’re just getting started with the Nandi series 🚀
36
+
37
+ - **Nandi-150M (Base)** — *Available now*
38
+ - **Nandi-150M (Instruct)** — Coming soon (open-sourced)
39
+ - **Nandi-500M (Base + Instruct)** — Planned next
40
+ - **Nandi-1B (Base + Instruct)** — Final milestone in the current roadmap
41
+
42
+ We are actively working on expanding the Nandi family to cover a wider range of use cases—from lightweight edge deployments to more capable instruction-tuned systems.
43
+
44
+ 📢 **Blogs & technical deep-dives coming soon**, where we’ll share:
45
+ - Architecture decisions and design trade-offs
46
+ - Training insights and dataset composition
47
+ - Benchmarks and real-world applications
48
+
49
+ Stay tuned!
50
+
51
  **This repo contains the base Nandi-150M model**, which has the following features:
52
 
53
  - Type: Causal Language Model
54
  - Training Stage: Pretraining (from scratch)
55
+ - Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings, **factorize embeddings**
56
+ - Number of Layers: 16*2 [Layer Sharing, effective layer =32]
57
  - Context Length: 2,048 tokens
58
  - Vocabulary Size: 131,072
59
 
60
  ## 🌍 Supported Languages
61
 
62
+ The model is trained on English and a diverse set of Indic languages, including:
63
 
64
  - Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
65
 
 
88
  do_sample=True,
89
  temperature=0.3,
90
  top_k=20,
91
+ top_p=0.95,
92
+ repetition_penalty=1.105
93
  )
94
 
95
  response = tokenizer.decode(
 
98
  )
99
 
100
  print(response)
101
+ ```