Update README.md
Browse files
README.md
CHANGED
|
@@ -1,54 +1,60 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
-
|
| 11 |
-
|
| 12 |
-
- **
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
-
|
| 16 |
-
-
|
| 17 |
-
-
|
| 18 |
-
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
-
|
| 22 |
-
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
-
|
| 26 |
-
-
|
| 27 |
-
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
inputs = tokenizer(
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
metrics:
|
| 3 |
+
name: arc:easy
|
| 4 |
+
value: 27.36
|
| 5 |
+
---
|
| 6 |
+
---
|
| 7 |
+
# Qwen3 16M Model with Falcon-H1-0.5B-Instruct Tokenizer
|
| 8 |
+
|
| 9 |
+
## Model Description
|
| 10 |
+
This is a 16M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer (32K vocabulary).
|
| 11 |
+
|
| 12 |
+
- **Architecture**: Qwen3 (Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE)
|
| 13 |
+
- **Tokenizer**: Falcon-H1-0.5B-Instruct (32K vocab)
|
| 14 |
+
- **Parameters**: 11,014,272
|
| 15 |
+
- **Precision**: BF16
|
| 16 |
+
- **Format**: SafeTensors
|
| 17 |
+
- **Vocabulary Size**: 32768
|
| 18 |
+
- **Use Case**: Desktop applications, balanced performance (true 16M params)
|
| 19 |
+
|
| 20 |
+
## Configuration
|
| 21 |
+
- vocab_size: 32768
|
| 22 |
+
- hidden_size: 128
|
| 23 |
+
- num_attention_heads: 16
|
| 24 |
+
- num_key_value_heads: 4
|
| 25 |
+
- num_hidden_layers: 8
|
| 26 |
+
- intermediate_size: 512
|
| 27 |
+
- head_dim: 128
|
| 28 |
+
- max_position_embeddings: 8192
|
| 29 |
+
|
| 30 |
+
## Special Tokens
|
| 31 |
+
- BOS: <|begin_of_text|> (id: 17)
|
| 32 |
+
- EOS: <|end_of_text|> (id: 11)
|
| 33 |
+
- PAD: <|pad|> (id: 0)
|
| 34 |
+
|
| 35 |
+
## Usage
|
| 36 |
+
```python
|
| 37 |
+
from transformers import Qwen3ForCausalLM, AutoTokenizer
|
| 38 |
+
|
| 39 |
+
model = Qwen3ForCausalLM.from_pretrained("./workspace/16m-falcon-tokenizer")
|
| 40 |
+
tokenizer = AutoTokenizer.from_pretrained("./workspace/16m-falcon-tokenizer")
|
| 41 |
+
|
| 42 |
+
# Generate text
|
| 43 |
+
inputs = tokenizer("Hello, world!", return_tensors="pt")
|
| 44 |
+
outputs = model.generate(**inputs, max_new_tokens=50)
|
| 45 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 46 |
+
|
| 47 |
+
# Batch processing (start small)
|
| 48 |
+
texts = ["Hello", "How are you", "Good morning"]
|
| 49 |
+
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
|
| 50 |
+
with torch.no_grad():
|
| 51 |
+
outputs = model.generate(**inputs, max_new_tokens=20)
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
## Important Notes
|
| 55 |
+
- Model uses Qwen3 architecture with Falcon tokenizer (32K vocabulary)
|
| 56 |
+
- All token IDs must be < 32768 to avoid CUDA errors
|
| 57 |
+
- Start with small batch sizes (1-4) and gradually increase
|
| 58 |
+
- Use proper padding to prevent dimension mismatches
|
| 59 |
+
- Model initialized with random weights - requires fine-tuning
|
| 60 |
+
- Compatible with Qwen3 APIs but uses Falcon vocabulary
|