C10X
/

test50

Safetensors

qwen3

Model card Files Files and versions

xet

Community

C10X commited on Nov 12, 2025

Commit

83aab6c

verified ·

1 Parent(s): 0e6b02d

Update README.md

Browse files

Files changed (1) hide show

README.md +60 -54

README.md CHANGED Viewed

@@ -1,54 +1,60 @@
-# Qwen3 16M Model with Falcon-H1-0.5B-Instruct Tokenizer
-## Model Description
-This is a 16M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer (32K vocabulary).
-- **Architecture**: Qwen3 (Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE)
-- **Tokenizer**: Falcon-H1-0.5B-Instruct (32K vocab)
-- **Parameters**: 11,014,272
-- **Precision**: BF16
-- **Format**: SafeTensors
-- **Vocabulary Size**: 32768
-- **Use Case**: Desktop applications, balanced performance (true 16M params)
-## Configuration
-- vocab_size: 32768
-- hidden_size: 128
-- num_attention_heads: 16
-- num_key_value_heads: 4
-- num_hidden_layers: 8
-- intermediate_size: 512
-- head_dim: 128
-- max_position_embeddings: 8192
-## Special Tokens
-- BOS: <|begin_of_text|> (id: 17)
-- EOS: <|end_of_text|> (id: 11)
-- PAD: <|pad|> (id: 0)
-## Usage
-```python
-from transformers import Qwen3ForCausalLM, AutoTokenizer
-model = Qwen3ForCausalLM.from_pretrained("./workspace/16m-falcon-tokenizer")
-tokenizer = AutoTokenizer.from_pretrained("./workspace/16m-falcon-tokenizer")
-# Generate text
-inputs = tokenizer("Hello, world!", return_tensors="pt")
-outputs = model.generate(**inputs, max_new_tokens=50)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-# Batch processing (start small)
-texts = ["Hello", "How are you", "Good morning"]
-inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
-with torch.no_grad():
-    outputs = model.generate(**inputs, max_new_tokens=20)
-```
-## Important Notes
-- Model uses Qwen3 architecture with Falcon tokenizer (32K vocabulary)
-- All token IDs must be < 32768 to avoid CUDA errors
-- Start with small batch sizes (1-4) and gradually increase
-- Use proper padding to prevent dimension mismatches
-- Model initialized with random weights - requires fine-tuning
-- Compatible with Qwen3 APIs but uses Falcon vocabulary

+---
+metrics:
+name: arc:easy
+value: 27.36
+---
+---
+# Qwen3 16M Model with Falcon-H1-0.5B-Instruct Tokenizer
+## Model Description
+This is a 16M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer (32K vocabulary).
+- **Architecture**: Qwen3 (Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE)
+- **Tokenizer**: Falcon-H1-0.5B-Instruct (32K vocab)
+- **Parameters**: 11,014,272
+- **Precision**: BF16
+- **Format**: SafeTensors
+- **Vocabulary Size**: 32768
+- **Use Case**: Desktop applications, balanced performance (true 16M params)
+## Configuration
+- vocab_size: 32768
+- hidden_size: 128
+- num_attention_heads: 16
+- num_key_value_heads: 4
+- num_hidden_layers: 8
+- intermediate_size: 512
+- head_dim: 128
+- max_position_embeddings: 8192
+## Special Tokens
+- BOS: <|begin_of_text|> (id: 17)
+- EOS: <|end_of_text|> (id: 11)
+- PAD: <|pad|> (id: 0)
+## Usage
+```python
+from transformers import Qwen3ForCausalLM, AutoTokenizer
+model = Qwen3ForCausalLM.from_pretrained("./workspace/16m-falcon-tokenizer")
+tokenizer = AutoTokenizer.from_pretrained("./workspace/16m-falcon-tokenizer")
+# Generate text
+inputs = tokenizer("Hello, world!", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+# Batch processing (start small)
+texts = ["Hello", "How are you", "Good morning"]
+inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
+with torch.no_grad():
+    outputs = model.generate(**inputs, max_new_tokens=20)
+```
+## Important Notes
+- Model uses Qwen3 architecture with Falcon tokenizer (32K vocabulary)
+- All token IDs must be < 32768 to avoid CUDA errors
+- Start with small batch sizes (1-4) and gradually increase
+- Use proper padding to prevent dimension mismatches
+- Model initialized with random weights - requires fine-tuning
+- Compatible with Qwen3 APIs but uses Falcon vocabulary