--- license: apache-2.0 datasets: - HuggingFaceFW/fineweb-edu - mattwesney/General_Inquiry_Thinking-Chain-Of-Thought - tatsu-lab/alpaca - databricks/databricks-dolly-15k - TeichAI/Step-3.5-Flash-2600x - TeichAI/convo-v1 language: - en tags: - small - glint - compactai --- Note: You must use the custom python script to run this model properly, you can download it from [here](https://huggingface.co/spaces/CompactAI-O/Homepage) by going into the downloads option and scrolling down. # Glint-1 > **⚠️ IMPORTANT NOTICE** > 1. **This model is experimental.** Glint-1 is a 1M parameter research model designed for architectural experimentation. > 2. **Performance characteristics:** The model exhibits behavioral patterns comparable to ~2M parameter models despite its compact size. > 3. **Not production-ready:** This release demonstrates functional capability, not optimal performance. ## Overview Glint-1 is an ultra-compact language model developed by CompactAI following our rebrand initiative. This 1M parameter model demonstrates that efficient architectural design can yield behavioral characteristics typically associated with larger models (~2M parameters). This release includes both **Pretrained Weights** (base language modeling) and **Instruction-Tuned Weights** (fine-tuned for conversational tasks). ## Model Specifications | Parameter | Value | | :--- | :--- | | **Architecture** | Transformer Decoder | | **Parameters** | ~1M | | **Effective Behavior** | ~2M parameter equivalent | | **Context Length** | 2,048 tokens | | **Vocabulary** | Standard | | **Normalization** | RMSNorm | | **Activation** | SwiGLU | ## Benchmarks Glint-1 has been evaluated on standard language modeling and reasoning benchmarks: ### BLiMP Benchmark Grammaticality minimal pairs across 67 paradigms. Accuracy measured as % grammatical < ungrammatical perplexity. ![BLiMP Benchmark](benchmarks/benchmark_blimp.png) ### ARC-Easy Benchmark Multiple-choice science QA (~2.4K questions) using perplexity-based answer selection. ![ARC-Easy Benchmark](benchmarks/benchmark_arc_easy.png) ### WikiText-2 Benchmark Language modeling perplexity on Wikipedia test split. Lower is better. ![WikiText-2 Benchmark](benchmarks/benchmark_wikitext2.png) ## Training Details | Parameter | Value | | :--- | :--- | | **Batch Size** | 48 | | **Learning Rate** | 8e-4 (pretrain), 2e-4 (SFT) | | **Warmup** | 300 steps | | **Weight Decay** | 0.02 | | **Max Grad Norm** | 1.0 | ## Limitations - **Repetition:** May exhibit repetitive generation patterns - **Knowledge:** Limited world knowledge due to parameter constraints - **Reliability:** Not suitable for production applications or critical tasks - **Purpose:** Intended for research, educational purposes, and architectural benchmarking ## Usage This model is released for research purposes. While functional, users should not expect state-of-the-art performance. The model demonstrates that compact architectures can achieve reasonable behavioral characteristics, making it suitable for: - Architectural research - Edge deployment experiments - Educational purposes - Baseline comparisons --- *Generated by CompactAI for research purposes. Use responsibly.*