CompactAI-O
/

Glint-1

Model card Files Files and versions

Glint-1 / README.md

CompactAI's picture

Create README.md

9280349 verified 5 days ago

|

history blame contribute delete

3.21 kB

	---
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb-edu
	- mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
	- tatsu-lab/alpaca
	- databricks/databricks-dolly-15k
	- TeichAI/Step-3.5-Flash-2600x
	- TeichAI/convo-v1
	language:
	- en
	tags:
	- small
	- glint
	- compactai
	---
	Note: You must use the custom python script to run this model properly, you can download it from [here](https://huggingface.co/spaces/CompactAI-O/Homepage) by going into the downloads option and scrolling down.
	# Glint-1

	> ⚠️ IMPORTANT NOTICE
	> 1. This model is experimental. Glint-1 is a 1M parameter research model designed for architectural experimentation.
	> 2. Performance characteristics: The model exhibits behavioral patterns comparable to ~2M parameter models despite its compact size.
	> 3. Not production-ready: This release demonstrates functional capability, not optimal performance.

	## Overview

	Glint-1 is an ultra-compact language model developed by CompactAI following our rebrand initiative. This 1M parameter model demonstrates that efficient architectural design can yield behavioral characteristics typically associated with larger models (~2M parameters).

	This release includes both Pretrained Weights (base language modeling) and Instruction-Tuned Weights (fine-tuned for conversational tasks).

	## Model Specifications

	\| Parameter \| Value \|
	\| :--- \| :--- \|
	\| Architecture \| Transformer Decoder \|
	\| Parameters \| ~1M \|
	\| Effective Behavior \| ~2M parameter equivalent \|
	\| Context Length \| 2,048 tokens \|
	\| Vocabulary \| Standard \|
	\| Normalization \| RMSNorm \|
	\| Activation \| SwiGLU \|

	## Benchmarks

	Glint-1 has been evaluated on standard language modeling and reasoning benchmarks:

	### BLiMP Benchmark
	Grammaticality minimal pairs across 67 paradigms. Accuracy measured as % grammatical < ungrammatical perplexity.

	![BLiMP Benchmark](benchmarks/benchmark_blimp.png)

	### ARC-Easy Benchmark
	Multiple-choice science QA (~2.4K questions) using perplexity-based answer selection.

	![ARC-Easy Benchmark](benchmarks/benchmark_arc_easy.png)

	### WikiText-2 Benchmark
	Language modeling perplexity on Wikipedia test split. Lower is better.

	![WikiText-2 Benchmark](benchmarks/benchmark_wikitext2.png)

	## Training Details

	\| Parameter \| Value \|
	\| :--- \| :--- \|
	\| Batch Size \| 48 \|
	\| Learning Rate \| 8e-4 (pretrain), 2e-4 (SFT) \|
	\| Warmup \| 300 steps \|
	\| Weight Decay \| 0.02 \|
	\| Max Grad Norm \| 1.0 \|

	## Limitations

	- Repetition: May exhibit repetitive generation patterns
	- Knowledge: Limited world knowledge due to parameter constraints
	- Reliability: Not suitable for production applications or critical tasks
	- Purpose: Intended for research, educational purposes, and architectural benchmarking

	## Usage

	This model is released for research purposes. While functional, users should not expect state-of-the-art performance. The model demonstrates that compact architectures can achieve reasonable behavioral characteristics, making it suitable for:

	- Architectural research
	- Edge deployment experiments
	- Educational purposes
	- Baseline comparisons

	---

	Generated by CompactAI for research purposes. Use responsibly.