Mistral 7B โ€” fraQtl KV Cache Optimized

KV cache optimized with fraQtl โ€” 3.5x less KV cache memory during inference.

Note: The model file size is the same as the original (~14GB). The optimization modifies V projection weights so that at inference time, the KV cache uses 3.5x less GPU memory. The savings happen at runtime, not at download.

Metric Value
Original mistralai/Mistral-7B-v0.1
File size Same as original (~14GB)
KV cache memory 3.5x less at runtime
PPL before 10.4690
PPL after 10.6908
Delta +0.222 (weight-level)
Config k=64, INT3

How It Works

The model weights are rotated into an eigenbasis that separates important V-cache directions from noise. At inference, the KV cache concentrates information in fewer dimensions โ€” using 3.5x less memory.

Our runtime compression (the real product) achieves +0.01 PPL on the same model. Contact us for integration.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("fraQtl/Mistral-7B-compressed")
tokenizer = AutoTokenizer.from_pretrained("fraQtl/Mistral-7B-compressed")
# KV cache uses 3.5x less memory during inference.

Generation Samples

Prompt: Explain how photosynthesis works in simple terms:

Output: Photosynthesis is the process by which plants use energy from sunlight to make their own food. Plants need carbon dioxide, water, and light to make their own food...

Prompt: The three most important breakthroughs in physics during the 20th century were

Output: The three most important breakthroughs in physics during the 20th century were the theory of relativity, quantum mechanics, and string theory...

Runtime Compression (the full product)

Method PPL Delta How
This download (weight-level) +0.222 Modified weights, download and use
Runtime cache compression +0.01 fraQtl applied during inference

Runtime compression gives 30x better quality. Available for production deployment.


fraqtl.ai | contact@fraqtl.ai | Patent pending. Paper: arXiv:2604.11501

Downloads last month
68
Safetensors
Model size
7B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for fraQtl/Mistral-7B-optimized

Quantizations
2 models

Paper for fraQtl/Mistral-7B-optimized