🇮🇳 Sky v2.0 — 11B

Introducing CREST: A New Architecture for Adaptive-Depth Reasoning

Independently developed by a solo researcher from India


Atharvsinh Jadav · Founder, 0labs · Gujarat, India

🌐 Website · 🤗 Hugging Face

One person. One architecture. A new way for AI to think.


The Story

Sky v2.0 is not built by a billion-dollar lab. It is not backed by hundreds of researchers.

It is built by one personAtharvsinh Jadav, a solo AI researcher from Gujarat, India — who asked a simple question:

"Why does every language model use the same amount of computation for every token, regardless of how hard the problem is?"

That question led to CREST — a fundamentally new neural network architecture that gives language models the ability to think deeper when problems are harder, and save resources when they are easy. Just like the human brain.

Sky v2.0 is the world's first language model built on this architecture. It has 11 billion parameters, scores 100% on code evaluation, and represents a step forward in how AI systems reason.

This is what one person from India can build. 🇮🇳


Key Features

  • Novel CREST Architecture — per-layer adaptive depth with 4 independently parameterized MLP steps and learned halting gates
  • 11B Parameters — expanded from a 4B base through CREST architectural augmentation
  • 100% Code Eval — perfect score across 25 verified coding benchmarks
  • 32K Context — supports 32,768 token context window
  • Apache 2.0 — free for commercial and research use

CREST Architecture

CRESTCognitively Recurrent Estimation of Step Termination

A novel architecture that replaces the fixed-depth FFN in each transformer layer with an adaptive-depth recurrent block. The model learns to decide, for every token at every layer, how many computational steps are needed.

CREST Architecture

How It Works

Every existing LLM uses the same computation for every token:

Standard Transformer:  Token → Attention → MLP → Output    (fixed depth, always same cost)

CREST makes computation proportional to difficulty:

CREST — Easy token:    Token → Attention → [MLP₁ → HALT]               → Output   (1 step)
CREST — Hard token:    Token → Attention → [MLP₁ → MLP₂ → MLP₃ → MLP₄] → Output   (4 steps)

Each CREST block contains:

Component Description
4 Independent MLPs Separate parameters per step. Step 1 = pretrained weights. Steps 2–4 = added depth.
Halting Gate Learned sigmoid function: h(t) = σ(W·x + b). Decides stop or continue.
Residual Gates Learned scalars controlling contribution of each step.
Ponder Cost Regularization term L = λ·ΣN(t) penalizing unnecessary computation.
Weighted Output Output = Σ p(t) · x(t) — probability-weighted across active steps.

Why CREST is Different

Architecture Year Developer Approach CREST Difference
Standard Transformer 2017 Vaswani et al. Fixed depth CREST adapts depth per token
Universal Transformer 2018 Google Shared weights CREST has independent params per step
PonderNet 2021 DeepMind Whole-model halt CREST halts per-layer (more granular)
Mixture of Experts 2022 Google / Mistral Width routing CREST adds depth, not width
Chain of Thought 2022 Prompt trick Text-level CREST is architectural, not prompting

CREST is the first architecture combining per-layer halting, independent parameters per step, retrofit compatibility with pretrained models, and Step 1 = original pretrained weights.

Model Overview

Name Sky v2.0
Total Parameters 11.00B
Architecture CREST (Adaptive-Depth Transformer)
CREST Steps 4 per layer (adaptive)
Layers 32
Hidden Size 2,560
Attention Heads 32
Context Length 32,768 tokens
Precision BFloat16
Training Hardware AMD Instinct MI300X (205 GB)
License Apache 2.0

Benchmarks

Benchmark Score
Code Eval (25 verified problems) 100% (25/25)
Identity Consistency 100%
Mathematical Reasoning Correct
CREST Adaptive Steps 1.0 – 2.0 avg
Full Code Eval Results (click to expand)

All problems were executed with test cases — not just syntax checked.

# Problem Status # Problem Status
1 Factorial 14 Char Frequency
2 Fibonacci 15 Matrix Transpose
3 Palindrome 16 Power Function
4 String Reverse 17 List Intersection
5 Max Element 18 Rotate List
6 Count Vowels 19 Sum Digits
7 Prime Check 20 Longest Word
8 Flatten List 21 Caesar Cipher
9 Remove Duplicates 22 Two Sum
10 GCD 23 Valid Parentheses
11 Binary Search 24 Title Case
12 Merge Sorted 25 Chunk List
13 Longest Substring Total 25/25

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "0labs-in/Sky-v2.0-11B"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "Write a function to find the longest common subsequence."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

output = model.generate(
    **inputs,
    max_new_tokens=2048,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)
response = tokenizer.decode(output[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)

Hardware Requirements

Precision VRAM Required
BFloat16 (recommended) ~24 GB
8-bit quantized ~12 GB
4-bit quantized ~7 GB

The Sky Model Family

Model Parameters Architecture Status
Sky v1.0 14B Standard Transformer Released
Sky v1.3-CSD 14B Cognitive Scaffolding Decay Released
Sky v2.0 11B CREST (Adaptive Depth) Latest

About 0labs

0labs is an independent AI research lab from Gujarat, India, founded by Atharvsinh Jadav. We focus on building AI systems that are efficient, transparent, and capable.

Research

Paper Area
CREST Adaptive computational depth for transformers
CSD Identity persistence without system prompts
COBRA Boundary-aware reasoning architecture

Citation

@misc{jadav2026crest,
    title   = {CREST: Cognitively Recurrent Estimation of Step Termination
               for Adaptive-Depth Language Models},
    author  = {Jadav, Atharvsinh},
    year    = {2026},
    publisher = {0labs},
    url     = {https://huggingface.co/0labs-in/Sky-v2.0-11B}
}

Made in India 🇮🇳 · Apache 2.0 · Free for everyone

Built with ❤️ by Atharvsinh Jadav · 0labs · Gujarat, India


Attribution

Base weights: Qwen3.5-4B (Apache 2.0). Architecture and training by 0labs.

Downloads last month
1,694
Safetensors
Model size
11B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0labs-in/Sky-v2.0-11B

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(213)
this model
Finetunes
3 models