vishesh-t27 commited on
Commit
94a8490
·
verified ·
1 Parent(s): 523221e

updated readme.md

Browse files
Files changed (1) hide show
  1. README.md +22 -21
README.md CHANGED
@@ -28,41 +28,23 @@ Nandi-150M brings the following key features:
28
 
29
  - Strong **multilingual capability** across English and Indic languages
30
  - Efficient design enabling **high performance at small scale (150M parameters)**
31
- - Improved training stability via **layer rescaling and z-loss regularization**
32
  - Reduced memory footprint using **factorized embeddings**
33
  - Better parameter efficiency through **layer sharing**
34
- - **Grouped Query Attention (GQA)** for faster inference
35
- - **RoPE-based positional encoding** for improved sequence modeling
36
 
37
  **This repo contains the base Nandi-150M model**, which has the following features:
38
 
39
  - Type: Causal Language Model
40
  - Training Stage: Pretraining (from scratch)
41
  - Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings
42
- - Number of Parameters: ~150M
43
- - Number of Layers: 16
44
- - Number of Attention Heads: 16 (Q) / 4 (KV)
45
  - Context Length: 2,048 tokens
46
  - Vocabulary Size: 131,072
47
- - Embedding: Factorized (rank = 196)
48
- - Precision: bfloat16
49
 
50
  ## 🌍 Supported Languages
51
 
52
  The model is trained on English and a diverse set of Indic languages, including (but not limited to):
53
 
54
- - Hindi
55
- - Bengali
56
- - Tamil
57
- - Telugu
58
- - Marathi
59
- - Gujarati
60
- - Kannada
61
- - Malayalam
62
- - Punjabi
63
- - Odia
64
-
65
- ---
66
 
67
  ## 🚀 Usage
68
 
@@ -78,4 +60,23 @@ prompt = "Explain transformer in simple words."
78
  inputs = tokenizer(prompt, return_tensors="pt")
79
 
80
  outputs = model.generate(**inputs, max_new_tokens=100)
81
- print(tokenizer.decode(outputs[0]))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  - Strong **multilingual capability** across English and Indic languages
30
  - Efficient design enabling **high performance at small scale (150M parameters)**
 
31
  - Reduced memory footprint using **factorized embeddings**
32
  - Better parameter efficiency through **layer sharing**
 
 
33
 
34
  **This repo contains the base Nandi-150M model**, which has the following features:
35
 
36
  - Type: Causal Language Model
37
  - Training Stage: Pretraining (from scratch)
38
  - Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings
39
+ - Number of Layers: 16*2
 
 
40
  - Context Length: 2,048 tokens
41
  - Vocabulary Size: 131,072
 
 
42
 
43
  ## 🌍 Supported Languages
44
 
45
  The model is trained on English and a diverse set of Indic languages, including (but not limited to):
46
 
47
+ - Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  ## 🚀 Usage
50
 
 
60
  inputs = tokenizer(prompt, return_tensors="pt")
61
 
62
  outputs = model.generate(**inputs, max_new_tokens=100)
63
+ print(tokenizer.decode(outputs[0]))
64
+ ```
65
+
66
+ ## 📝 Upcoming Releases & Roadmap
67
+
68
+ We’re just getting started with the Nandi series 🚀
69
+
70
+ - **Nandi-150M (Base)** — *Available now*
71
+ - **Nandi-150M (Instruct)** — Coming soon (open-sourced)
72
+ - **Nandi-500M (Base + Instruct)** — Planned next
73
+ - **Nandi-1B (Base + Instruct)** — Final milestone in the current roadmap
74
+
75
+ We are actively working on expanding the Nandi family to cover a wider range of use cases—from lightweight edge deployments to more capable instruction-tuned systems.
76
+
77
+ 📢 **Blogs & technical deep-dives coming soon**, where we’ll share:
78
+ - Architecture decisions and design trade-offs
79
+ - Training insights and dataset composition
80
+ - Benchmarks and real-world applications
81
+
82
+ Stay tuned!