vishesh-t27 commited on
Commit
a522301
·
verified ·
1 Parent(s): ede3ae3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -11
README.md CHANGED
@@ -20,7 +20,8 @@ library_name: transformers
20
 
21
  ## Introduction
22
 
23
- Nandi-Mini-500M-Early-Checkpoint is an early-stage checkpoint from the upcoming **Nandi-Mini-500M** model family — a compact multilingual language model focused on strong efficiency, deployment flexibility, and Indic language support.
 
24
 
25
  The model is being trained completely from scratch and is designed to deliver strong performance at low compute and memory budgets. This checkpoint is shared to provide an early look into the model’s scaling behavior and training progress.
26
 
@@ -99,7 +100,7 @@ This remains an active research area within the Nandi model family, and we plan
99
 
100
  - Type: Causal Language Model
101
  - Training Stage: Early Pretraining Checkpoint
102
- - Parameters: ~500M
103
  - Architecture: Transformer decoder
104
  - Positional Encoding: RoPE
105
  - Normalization: RMSNorm + QK Norm
@@ -159,12 +160,12 @@ The model is trained on English and a diverse set of Indic languages, including:
159
  # 🚀 Usage
160
 
161
  ```python
162
- !pip install transformers
163
 
164
  from transformers import AutoModelForCausalLM, AutoTokenizer
165
  import torch
166
 
167
- model_name = "FrontiersMind/Nandi-mini-500M-Early-Checkpoint"
168
 
169
  tokenizer = AutoTokenizer.from_pretrained(
170
  model_name,
@@ -178,6 +179,10 @@ model = AutoModelForCausalLM.from_pretrained(
178
  torch_dtype=torch.bfloat16
179
  ).eval()
180
 
 
 
 
 
181
  prompt = """
182
  The night was quiet and the streets were empty.
183
  A single light flickered in the distance.
@@ -190,13 +195,16 @@ model_inputs = tokenizer(
190
  ).to(model.device)
191
 
192
  outputs = model.generate(
193
- **model_inputs,
194
- max_new_tokens=64,
195
- do_sample=True,
196
- temperature=0.7,
197
- top_p=0.95,
198
- repetition_penalty=1.1
199
- )
 
 
 
200
 
201
  response = tokenizer.decode(
202
  outputs[0],
 
20
 
21
  ## Introduction
22
 
23
+
24
+ Nandi-Mini-600M-Early-Checkpoint is an early-stage checkpoint from the upcoming **Nandi-Mini-600M** model family — a compact multilingual language model focused on strong efficiency, deployment flexibility, and Indic language support.
25
 
26
  The model is being trained completely from scratch and is designed to deliver strong performance at low compute and memory budgets. This checkpoint is shared to provide an early look into the model’s scaling behavior and training progress.
27
 
 
100
 
101
  - Type: Causal Language Model
102
  - Training Stage: Early Pretraining Checkpoint
103
+ - Parameters: ~600M
104
  - Architecture: Transformer decoder
105
  - Positional Encoding: RoPE
106
  - Normalization: RMSNorm + QK Norm
 
160
  # 🚀 Usage
161
 
162
  ```python
163
+ !pip install transformers=='5.4.0'
164
 
165
  from transformers import AutoModelForCausalLM, AutoTokenizer
166
  import torch
167
 
168
+ model_name = "FrontiersMind/Nandi-Mini-600M-Early-Checkpoint"
169
 
170
  tokenizer = AutoTokenizer.from_pretrained(
171
  model_name,
 
179
  torch_dtype=torch.bfloat16
180
  ).eval()
181
 
182
+
183
+ model.config.kv_cache_mode = "shared"
184
+ # model.config.kv_cache_mode = "vanilla"
185
+
186
  prompt = """
187
  The night was quiet and the streets were empty.
188
  A single light flickered in the distance.
 
195
  ).to(model.device)
196
 
197
  outputs = model.generate(
198
+ **model_inputs,
199
+ max_new_tokens=50,
200
+ do_sample=False,
201
+ temperature=0.3,
202
+ top_k=20,
203
+ top_p=0.95,
204
+ repetition_penalty=1.1,
205
+ pad_token_id=tokenizer.eos_token_id,
206
+ use_cache=True, # Disable KV cache
207
+ )
208
 
209
  response = tokenizer.decode(
210
  outputs[0],