Sky v2.0-6.5B — CREST Adaptive-Depth Language Model
Built by 0labs — Atharvsinh Jadav, Gujarat, India
One researcher. One GPU. Adaptive-depth LLMs.
What is This Model?
Sky v2.0-6.5B is a 6.47 billion parameter language model built on the CREST (Cognitively Recurrent Estimation of Step Termination) architecture — a proprietary adaptive-depth architecture developed by 0labs.
This is the K=2 variant of the full Sky v2.0-11B model. It retains the first two computational steps from the original four-step CREST architecture, preserving adaptive depth while being significantly lighter.
How CREST Works
In a standard transformer, every token gets the exact same computation — the word "the" gets the same processing as a critical step in a mathematical proof. CREST changes this.
Each transformer layer's Feed-Forward Network (FFN) is replaced with K independent MLPs and a learned halting gate. The model decides, per-token, per-layer, how many computational steps to take:
Easy token ("the", "is") → 1 step (fast, minimal computation)
Hard token (algorithm logic) → 2 steps (deeper reasoning)
The halting gate is a learned sigmoid function that outputs a probability of stopping after each step. The final output is a weighted combination of all intermediate states.
Model Details
| Property | Value |
|---|---|
| Architecture | CREST (0labs proprietary) |
| Total Parameters | 6.47B |
| CREST Steps (K) | 2 |
| Hidden Dimension | 2,560 |
| Layers | 32 |
| Attention Heads | 32 |
| Context Window | 32,768 tokens |
| Precision | BFloat16 |
| Vocabulary Size | 248,320 |
| Base Model | Derived from Sky v2.0-11B (K=4) |
| License | Apache 2.0 |
What's Included vs. the Full 11B
| Feature | Sky v2.0-11B | Sky v2.0-6.5B (this model) |
|---|---|---|
| CREST Step 1 (base knowledge) | ✅ | ✅ |
| CREST Step 2 (verification) | ✅ | ✅ |
| CREST Step 3 (deep reasoning) | ✅ | ❌ |
| CREST Step 4 (complex logic) | ✅ | ❌ |
| Halting gates | ✅ | ✅ |
| Adaptive depth | ✅ | ✅ |
| Parameters | 11.2B | 6.47B |
Steps 1 and 2 handle ~95% of all tokens in typical usage. This model retains the vast majority of the full model's capability at ~58% of the parameter count.
Quick Start
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"0labs-in/Sky-v2.0-6.5B",
trust_remote_code=True, # Required — CREST is a custom architecture
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("0labs-in/Sky-v2.0-6.5B")
prompt = "Write a Python function to check if a number is prime."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Note:
trust_remote_code=Trueis required because CREST uses custom modeling code (modeling_sky_crest.py,crest_block.py,configuration_sky_crest.py) that is included in this repository.
Hardware Requirements
| Setup | VRAM Required | Works? |
|---|---|---|
| NVIDIA RTX 4090 (24GB) | ~13GB BF16 | ✅ |
| NVIDIA RTX 3090 (24GB) | ~13GB BF16 | ✅ |
| NVIDIA A100 (40/80GB) | ~13GB BF16 | ✅ |
| AMD MI300X (192GB) | ~13GB BF16 | ✅ |
| Apple M1/M2/M3 (16GB+) | ~13GB | ✅ (with MPS) |
| Google Colab Free (T4 15GB) | ~13GB BF16 | ⚠️ Tight |
Load with load_in_4bit=True |
~4GB | ✅ Any GPU |
For GPUs with less than 16GB VRAM, load in 4-bit:
model = AutoModelForCausalLM.from_pretrained(
"0labs-in/Sky-v2.0-6.5B",
trust_remote_code=True,
load_in_4bit=True,
device_map="auto",
)
Architecture: CREST in Detail
The CREST Block (replacing standard FFN)
Input from Attention
↓
┌──────────┐
│ MLP₁ │ ← Step 1: Original pretrained knowledge
└──────────┘
↓
Halting Gate: "Is this token done?"
│
├── If h₁ ≈ 1.0 → HALT (output = state₁)
│
↓
┌──────────┐
│ MLP₂ │ ← Step 2: Verification & deeper processing
└──────────┘
↓
Final output = p₁·state₁ + p₂·state₂
Key Properties
- Independent parameters: MLP₁ and MLP₂ have completely separate weights, allowing specialization
- Learned halting: The sigmoid gate learns which tokens need more computation
- Residual gates: Scalar values controlling each step's contribution strength
- Convex combination: Output probabilities always sum to 1.0
Model Family
| Model | Parameters | CREST K | Best For |
|---|---|---|---|
| Sky v2.0-11B | 11.2B | 4 | Maximum capability |
| Sky v2.0-6.5B | 6.47B | 2 | Balanced performance |
| Sky v2.0-11B-INT4 | 11.2B | 4 | Full CREST, memory-efficient |
| Sky v2.0-Lite | 6.47B | 2 | Laptops & edge devices |
Citation
@article{jadav2026crest,
title={CREST: Cognitively Recurrent Estimation of Step Termination for Adaptive-Depth Language Modeling},
author={Jadav, Atharvsinh},
year={2026},
url={https://huggingface.co/0labs-in/Sky-v2.0-11B}
}
About 0labs
0labs is an independent AI research lab founded by Atharvsinh Jadav in Gujarat, India. We build adaptive-depth LLMs that think harder on hard problems — trained on a single GPU.
- 🌐 Website: 0labs.in
- 🤗 HuggingFace: huggingface.co/0labs-in
- Downloads last month
- 38