CompactAI commited on
Commit
9280349
·
verified ·
1 Parent(s): 139f5e2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -0
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - HuggingFaceFW/fineweb-edu
5
+ - mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
6
+ - tatsu-lab/alpaca
7
+ - databricks/databricks-dolly-15k
8
+ - TeichAI/Step-3.5-Flash-2600x
9
+ - TeichAI/convo-v1
10
+ language:
11
+ - en
12
+ tags:
13
+ - small
14
+ - glint
15
+ - compactai
16
+ ---
17
+ Note: You must use the custom python script to run this model properly, you can download it from [here](https://huggingface.co/spaces/CompactAI-O/Homepage) by going into the downloads option and scrolling down.
18
+ # Glint-1
19
+
20
+ > **⚠️ IMPORTANT NOTICE**
21
+ > 1. **This model is experimental.** Glint-1 is a 1M parameter research model designed for architectural experimentation.
22
+ > 2. **Performance characteristics:** The model exhibits behavioral patterns comparable to ~2M parameter models despite its compact size.
23
+ > 3. **Not production-ready:** This release demonstrates functional capability, not optimal performance.
24
+
25
+ ## Overview
26
+
27
+ Glint-1 is an ultra-compact language model developed by CompactAI following our rebrand initiative. This 1M parameter model demonstrates that efficient architectural design can yield behavioral characteristics typically associated with larger models (~2M parameters).
28
+
29
+ This release includes both **Pretrained Weights** (base language modeling) and **Instruction-Tuned Weights** (fine-tuned for conversational tasks).
30
+
31
+ ## Model Specifications
32
+
33
+ | Parameter | Value |
34
+ | :--- | :--- |
35
+ | **Architecture** | Transformer Decoder |
36
+ | **Parameters** | ~1M |
37
+ | **Effective Behavior** | ~2M parameter equivalent |
38
+ | **Context Length** | 2,048 tokens |
39
+ | **Vocabulary** | Standard |
40
+ | **Normalization** | RMSNorm |
41
+ | **Activation** | SwiGLU |
42
+
43
+ ## Benchmarks
44
+
45
+ Glint-1 has been evaluated on standard language modeling and reasoning benchmarks:
46
+
47
+ ### BLiMP Benchmark
48
+ Grammaticality minimal pairs across 67 paradigms. Accuracy measured as % grammatical < ungrammatical perplexity.
49
+
50
+ ![BLiMP Benchmark](benchmarks/benchmark_blimp.png)
51
+
52
+ ### ARC-Easy Benchmark
53
+ Multiple-choice science QA (~2.4K questions) using perplexity-based answer selection.
54
+
55
+ ![ARC-Easy Benchmark](benchmarks/benchmark_arc_easy.png)
56
+
57
+ ### WikiText-2 Benchmark
58
+ Language modeling perplexity on Wikipedia test split. Lower is better.
59
+
60
+ ![WikiText-2 Benchmark](benchmarks/benchmark_wikitext2.png)
61
+
62
+ ## Training Details
63
+
64
+ | Parameter | Value |
65
+ | :--- | :--- |
66
+ | **Batch Size** | 48 |
67
+ | **Learning Rate** | 8e-4 (pretrain), 2e-4 (SFT) |
68
+ | **Warmup** | 300 steps |
69
+ | **Weight Decay** | 0.02 |
70
+ | **Max Grad Norm** | 1.0 |
71
+
72
+ ## Limitations
73
+
74
+ - **Repetition:** May exhibit repetitive generation patterns
75
+ - **Knowledge:** Limited world knowledge due to parameter constraints
76
+ - **Reliability:** Not suitable for production applications or critical tasks
77
+ - **Purpose:** Intended for research, educational purposes, and architectural benchmarking
78
+
79
+ ## Usage
80
+
81
+ This model is released for research purposes. While functional, users should not expect state-of-the-art performance. The model demonstrates that compact architectures can achieve reasonable behavioral characteristics, making it suitable for:
82
+
83
+ - Architectural research
84
+ - Edge deployment experiments
85
+ - Educational purposes
86
+ - Baseline comparisons
87
+
88
+ ---
89
+
90
+ *Generated by CompactAI for research purposes. Use responsibly.*