C10X commited on
Commit
83aab6c
·
verified ·
1 Parent(s): 0e6b02d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -54
README.md CHANGED
@@ -1,54 +1,60 @@
1
- # Qwen3 16M Model with Falcon-H1-0.5B-Instruct Tokenizer
2
-
3
- ## Model Description
4
- This is a 16M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer (32K vocabulary).
5
-
6
- - **Architecture**: Qwen3 (Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE)
7
- - **Tokenizer**: Falcon-H1-0.5B-Instruct (32K vocab)
8
- - **Parameters**: 11,014,272
9
- - **Precision**: BF16
10
- - **Format**: SafeTensors
11
- - **Vocabulary Size**: 32768
12
- - **Use Case**: Desktop applications, balanced performance (true 16M params)
13
-
14
- ## Configuration
15
- - vocab_size: 32768
16
- - hidden_size: 128
17
- - num_attention_heads: 16
18
- - num_key_value_heads: 4
19
- - num_hidden_layers: 8
20
- - intermediate_size: 512
21
- - head_dim: 128
22
- - max_position_embeddings: 8192
23
-
24
- ## Special Tokens
25
- - BOS: <|begin_of_text|> (id: 17)
26
- - EOS: <|end_of_text|> (id: 11)
27
- - PAD: <|pad|> (id: 0)
28
-
29
- ## Usage
30
- ```python
31
- from transformers import Qwen3ForCausalLM, AutoTokenizer
32
-
33
- model = Qwen3ForCausalLM.from_pretrained("./workspace/16m-falcon-tokenizer")
34
- tokenizer = AutoTokenizer.from_pretrained("./workspace/16m-falcon-tokenizer")
35
-
36
- # Generate text
37
- inputs = tokenizer("Hello, world!", return_tensors="pt")
38
- outputs = model.generate(**inputs, max_new_tokens=50)
39
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
40
-
41
- # Batch processing (start small)
42
- texts = ["Hello", "How are you", "Good morning"]
43
- inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
44
- with torch.no_grad():
45
- outputs = model.generate(**inputs, max_new_tokens=20)
46
- ```
47
-
48
- ## Important Notes
49
- - Model uses Qwen3 architecture with Falcon tokenizer (32K vocabulary)
50
- - All token IDs must be < 32768 to avoid CUDA errors
51
- - Start with small batch sizes (1-4) and gradually increase
52
- - Use proper padding to prevent dimension mismatches
53
- - Model initialized with random weights - requires fine-tuning
54
- - Compatible with Qwen3 APIs but uses Falcon vocabulary
 
 
 
 
 
 
 
1
+ ---
2
+ metrics:
3
+ name: arc:easy
4
+ value: 27.36
5
+ ---
6
+ ---
7
+ # Qwen3 16M Model with Falcon-H1-0.5B-Instruct Tokenizer
8
+
9
+ ## Model Description
10
+ This is a 16M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer (32K vocabulary).
11
+
12
+ - **Architecture**: Qwen3 (Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE)
13
+ - **Tokenizer**: Falcon-H1-0.5B-Instruct (32K vocab)
14
+ - **Parameters**: 11,014,272
15
+ - **Precision**: BF16
16
+ - **Format**: SafeTensors
17
+ - **Vocabulary Size**: 32768
18
+ - **Use Case**: Desktop applications, balanced performance (true 16M params)
19
+
20
+ ## Configuration
21
+ - vocab_size: 32768
22
+ - hidden_size: 128
23
+ - num_attention_heads: 16
24
+ - num_key_value_heads: 4
25
+ - num_hidden_layers: 8
26
+ - intermediate_size: 512
27
+ - head_dim: 128
28
+ - max_position_embeddings: 8192
29
+
30
+ ## Special Tokens
31
+ - BOS: <|begin_of_text|> (id: 17)
32
+ - EOS: <|end_of_text|> (id: 11)
33
+ - PAD: <|pad|> (id: 0)
34
+
35
+ ## Usage
36
+ ```python
37
+ from transformers import Qwen3ForCausalLM, AutoTokenizer
38
+
39
+ model = Qwen3ForCausalLM.from_pretrained("./workspace/16m-falcon-tokenizer")
40
+ tokenizer = AutoTokenizer.from_pretrained("./workspace/16m-falcon-tokenizer")
41
+
42
+ # Generate text
43
+ inputs = tokenizer("Hello, world!", return_tensors="pt")
44
+ outputs = model.generate(**inputs, max_new_tokens=50)
45
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
46
+
47
+ # Batch processing (start small)
48
+ texts = ["Hello", "How are you", "Good morning"]
49
+ inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
50
+ with torch.no_grad():
51
+ outputs = model.generate(**inputs, max_new_tokens=20)
52
+ ```
53
+
54
+ ## Important Notes
55
+ - Model uses Qwen3 architecture with Falcon tokenizer (32K vocabulary)
56
+ - All token IDs must be < 32768 to avoid CUDA errors
57
+ - Start with small batch sizes (1-4) and gradually increase
58
+ - Use proper padding to prevent dimension mismatches
59
+ - Model initialized with random weights - requires fine-tuning
60
+ - Compatible with Qwen3 APIs but uses Falcon vocabulary