CompactAI-O
/

Glint-1

+---
+license: apache-2.0
+datasets:
+- HuggingFaceFW/fineweb-edu
+- mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
+- tatsu-lab/alpaca
+- databricks/databricks-dolly-15k
+- TeichAI/Step-3.5-Flash-2600x
+- TeichAI/convo-v1
+language:
+- en
+tags:
+- small
+- glint
+- compactai
+---
+Note: You must use the custom python script to run this model properly, you can download it from [here](https://huggingface.co/spaces/CompactAI-O/Homepage) by going into the downloads option and scrolling down.
+# Glint-1
+> **⚠️ IMPORTANT NOTICE**
+> 1. **This model is experimental.** Glint-1 is a 1M parameter research model designed for architectural experimentation.
+> 2. **Performance characteristics:** The model exhibits behavioral patterns comparable to ~2M parameter models despite its compact size.
+> 3. **Not production-ready:** This release demonstrates functional capability, not optimal performance.
+## Overview
+Glint-1 is an ultra-compact language model developed by CompactAI following our rebrand initiative. This 1M parameter model demonstrates that efficient architectural design can yield behavioral characteristics typically associated with larger models (~2M parameters).
+This release includes both **Pretrained Weights** (base language modeling) and **Instruction-Tuned Weights** (fine-tuned for conversational tasks).
+## Model Specifications
+| Parameter | Value |
+| :--- | :--- |
+| **Architecture** | Transformer Decoder |
+| **Parameters** | ~1M |
+| **Effective Behavior** | ~2M parameter equivalent |
+| **Context Length** | 2,048 tokens |
+| **Vocabulary** | Standard |
+| **Normalization** | RMSNorm |
+| **Activation** | SwiGLU |
+## Benchmarks
+Glint-1 has been evaluated on standard language modeling and reasoning benchmarks:
+### BLiMP Benchmark
+Grammaticality minimal pairs across 67 paradigms. Accuracy measured as % grammatical < ungrammatical perplexity.
+![BLiMP Benchmark](benchmarks/benchmark_blimp.png)
+### ARC-Easy Benchmark
+Multiple-choice science QA (~2.4K questions) using perplexity-based answer selection.
+![ARC-Easy Benchmark](benchmarks/benchmark_arc_easy.png)
+### WikiText-2 Benchmark
+Language modeling perplexity on Wikipedia test split. Lower is better.
+![WikiText-2 Benchmark](benchmarks/benchmark_wikitext2.png)
+## Training Details
+| Parameter | Value |
+| :--- | :--- |
+| **Batch Size** | 48 |
+| **Learning Rate** | 8e-4 (pretrain), 2e-4 (SFT) |
+| **Warmup** | 300 steps |
+| **Weight Decay** | 0.02 |
+| **Max Grad Norm** | 1.0 |
+## Limitations
+- **Repetition:** May exhibit repetitive generation patterns
+- **Knowledge:** Limited world knowledge due to parameter constraints
+- **Reliability:** Not suitable for production applications or critical tasks
+- **Purpose:** Intended for research, educational purposes, and architectural benchmarking
+## Usage
+This model is released for research purposes. While functional, users should not expect state-of-the-art performance. The model demonstrates that compact architectures can achieve reasonable behavioral characteristics, making it suitable for:
+- Architectural research
+- Edge deployment experiments
+- Educational purposes
+- Baseline comparisons
+---
+*Generated by CompactAI for research purposes. Use responsibly.*