fraQtl AI Research

company

https://fraQtl.ai

Activity Feed

AI & ML interests

KV cache compression, inference optimization, model compression

Recent Activity

Zenalyze updated a model 1 day ago

fraQtl/TinyLlama-1.1B-optimized

Zenalyze updated a model 1 day ago

fraQtl/Qwen-2.5-3B-optimized

Zenalyze updated a model 1 day ago

fraQtl/Llama-3.2-3B-optimized

View all activity

Organization Card

Community About org cards

🧠 fraQtl

KV Cache Compression (5×, near-zero loss)

5× smaller KV cache. Same quality. Sometimes better.

⚡ What this is

fraQtl compresses the KV cache of transformers without breaking attention.

5× memory reduction
+0.002 PPL (near-zero loss)
100% needle recall (FP16: 98.7%)
25 second setup
zero runtime overhead

🧠 Why it works

We don't compress blindly.

We:

detect attention-critical directions
preserve them in high precision
compress the rest

Compression preserves decisions, not just values.

🧪 Key result

Compression can improve models.

–0.1 PPL improvement
no training
no fine-tuning

📊 Results

Model	Params	Architecture	PPL Delta (k=32)	Compression
Mistral 7B	7B	GQA-8	+0.007	5×
Llama 3.2 3B	3B	GQA-3	+0.011	5×
Llama-2 7B	7B	MHA-32	+0.007	5×
Qwen 2.5 3B	3B	GQA-2	+0.010	5×
Llama 3.1 8B	8B	GQA-8	+0.025	5×
Llama-2 13B	13B	MHA-40	+0.005	5×
Llama 3.1 70B	70B	GQA-8	+0.019	5×

🧪 Long Context (Needle-in-a-Haystack)

fraQtl achieves:

100% recall (1K → 16K)
beats FP16 baseline (98.7%)

⚔️ Comparison

Method	Behavior
Quantization	adds noise
Low-rank	removes information
fraQtl	preserves signal

🚀 How to Use

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("fraQtl/MODEL")

⚡ Setup

Calibration: ~25 seconds
Runtime overhead: ~0%
Works across multiple models

📚 Resources

🔐 Status

Patent filed (April 6, 2026)
Multi-model validated
Production-ready

🧠 One line

Compression that understands attention.

models 5

datasets 0

None public yet

AI & ML interests

Recent Activity

Team members 1

🧠 fraQtl

KV Cache Compression (5×, near-zero loss)

5× smaller KV cache. Same quality. Sometimes better.

⚡ What this is

🧠 Why it works

🧪 Key result

📊 Results

🧪 Long Context (Needle-in-a-Haystack)

⚔️ Comparison

🚀 How to Use

⚡ Setup

📚 Resources

🔐 Status

🧠 One line

models 5 Sort: Recently updated

datasets 0

models 5