Atharvsinh commited on
Commit
8e13b7a
Β·
verified Β·
1 Parent(s): 46cc6c9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +204 -29
README.md CHANGED
@@ -1,57 +1,232 @@
1
  ---
2
  license: apache-2.0
3
- language: en
 
4
  tags:
5
  - 0labs
6
  - sky
7
  - crest
8
  - adaptive-depth
 
 
 
 
 
 
9
  ---
10
 
11
- # Sky v2.0 β€” 6.5B (CREST K=2)
12
 
13
- **Built by 0labs β€” Atharvsinh Jadav, Gujarat, India**
 
 
 
14
 
15
- ## Architecture: CREST (Cognitively Recurrent Estimation of Step Termination)
 
 
 
 
16
 
17
- This model uses the **CREST adaptive-depth architecture** β€” our proprietary
18
- architecture where each FFN layer has 2 independent MLPs and a learned halting
19
- gate. The model decides per-token, per-layer how many computational steps to take.
20
 
21
- - Easy tokens ("the", "is") β†’ 1 step (fast)
22
- - Hard tokens (algorithm logic, math) β†’ 2 steps (deeper thinking)
23
 
24
- ## Specs
 
 
 
 
 
 
 
25
 
26
  | Property | Value |
27
  |---|---|
28
- | Architecture | CREST (0labs proprietary) |
29
- | Parameters | 6.5B |
30
- | CREST Steps (K) | 2 |
31
- | Base Model Size | 4B |
32
- | Hidden Dimension | 2,560 |
33
- | Layers | 32 |
34
- | Attention Heads | 32 |
35
- | Context Window | 32,768 tokens |
36
- | Quantization | INT4 |
37
-
38
- ## Usage
 
 
 
 
 
 
 
 
 
39
 
40
  ```python
41
  import torch
42
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
 
 
43
 
44
  model = AutoModelForCausalLM.from_pretrained(
45
- "0labs-in/Sky-v2.0",
46
- trust_remote_code=True, # Required for CREST architecture
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  torch_dtype=torch.bfloat16,
48
  device_map="auto",
49
  )
50
- tokenizer = AutoTokenizer.from_pretrained("0labs-in/Sky-v2.0")
51
  ```
52
 
53
- ## 0labs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
- Independent AI research lab. One researcher. One GPU. Adaptive-depth LLMs.
56
- - Website: https://0labs.in
57
- - HuggingFace: https://huggingface.co/0labs-in
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
  tags:
6
  - 0labs
7
  - sky
8
  - crest
9
  - adaptive-depth
10
+ - llm
11
+ - lightweight
12
+ - edge-ai
13
+ library_name: transformers
14
+ pipeline_tag: text-generation
15
+ base_model: 0labs-in/Sky-v2.0-11B
16
  ---
17
 
18
+ # Sky v2.0-Lite β€” Lightweight CREST for Everyone
19
 
20
+ <p align="center">
21
+ <strong>Built by <a href="https://0labs.in">0labs</a> β€” Atharvsinh Jadav, Gujarat, India</strong><br>
22
+ <em>Adaptive-depth AI that runs on any laptop.</em>
23
+ </p>
24
 
25
+ ---
26
+
27
+ ## What is This Model?
28
+
29
+ Sky v2.0-Lite is the **smallest and fastest** member of the Sky v2.0 family. It combines **K=2 step reduction** (keeping only CREST Steps 1 and 2) with the option for **4-bit quantization at load time**, making it possible to run a CREST adaptive-depth model on hardware as modest as an **8GB laptop GPU** or even a **Google Colab free tier T4**.
30
 
31
+ Despite being the lightest variant, it still has **adaptive depth** β€” the model still decides per-token how much computation to use. It is not a standard transformer; it is a real CREST model.
 
 
32
 
33
+ ### Who is This For?
 
34
 
35
+ - πŸŽ“ **Students** exploring adaptive-depth LLMs on a budget
36
+ - πŸ’» **Developers** building AI-powered applications on consumer hardware
37
+ - πŸ“± **Edge/on-device** deployment where every GB matters
38
+ - πŸ§ͺ **Researchers** who want to experiment with CREST without cloud GPUs
39
+
40
+ ---
41
+
42
+ ## Model Details
43
 
44
  | Property | Value |
45
  |---|---|
46
+ | **Architecture** | CREST (0labs proprietary) |
47
+ | **Total Parameters** | 6.47B |
48
+ | **CREST Steps (K)** | 2 |
49
+ | **Hidden Dimension** | 2,560 |
50
+ | **Layers** | 32 |
51
+ | **Attention Heads** | 32 |
52
+ | **Context Window** | 32,768 tokens |
53
+ | **Disk Size** | ~13GB (BFloat16) |
54
+ | **VRAM (BF16)** | ~13GB |
55
+ | **VRAM (INT4)** | ~4GB |
56
+ | **Vocabulary Size** | 248,320 |
57
+ | **License** | Apache 2.0 |
58
+
59
+ ---
60
+
61
+ ## Quick Start
62
+
63
+ ### Recommended: 4-Bit Loading (~4GB VRAM) ⭐
64
+
65
+ This is the recommended way to run Sky v2.0-Lite on consumer hardware:
66
 
67
  ```python
68
  import torch
69
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
70
+
71
+ # 4-bit quantization config β€” uses only ~4GB VRAM
72
+ quantization_config = BitsAndBytesConfig(
73
+ load_in_4bit=True,
74
+ bnb_4bit_compute_dtype=torch.bfloat16,
75
+ bnb_4bit_quant_type="nf4",
76
+ )
77
 
78
  model = AutoModelForCausalLM.from_pretrained(
79
+ "0labs-in/Sky-v2.0-Lite",
80
+ trust_remote_code=True, # Required β€” CREST is a custom architecture
81
+ quantization_config=quantization_config,
82
+ device_map="auto",
83
+ )
84
+ tokenizer = AutoTokenizer.from_pretrained("0labs-in/Sky-v2.0-Lite")
85
+
86
+ # Chat
87
+ prompt = "Explain quantum computing in simple terms."
88
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
89
+
90
+ with torch.no_grad():
91
+ outputs = model.generate(
92
+ **inputs,
93
+ max_new_tokens=512,
94
+ temperature=0.7,
95
+ top_p=0.9,
96
+ do_sample=True,
97
+ )
98
+
99
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
100
+ ```
101
+
102
+ ### Full Precision Loading (needs ~13GB VRAM)
103
+
104
+ ```python
105
+ model = AutoModelForCausalLM.from_pretrained(
106
+ "0labs-in/Sky-v2.0-Lite",
107
+ trust_remote_code=True,
108
  torch_dtype=torch.bfloat16,
109
  device_map="auto",
110
  )
 
111
  ```
112
 
113
+ > **Note:** `trust_remote_code=True` is required β€” CREST uses custom architecture code included in this repository.
114
+
115
+ ---
116
+
117
+ ## Hardware Compatibility
118
+
119
+ | Device | Loading Mode | VRAM Used | Works? |
120
+ |---|---|---|---|
121
+ | **Laptop RTX 4060 (8GB)** | **INT4** | **~4GB** | **βœ…** |
122
+ | **Google Colab T4 (15GB)** | **INT4** | **~4GB** | **βœ…** |
123
+ | **Apple M1/M2 (8GB)** | **INT4** | **~4GB** | **βœ…** |
124
+ | MacBook Pro M3 (18GB) | BF16 | ~13GB | βœ… |
125
+ | RTX 3090 (24GB) | BF16 | ~13GB | βœ… |
126
+ | RTX 4090 (24GB) | BF16 | ~13GB | βœ… |
127
+ | A100 (40/80GB) | BF16 | ~13GB | βœ… |
128
+ | MI300X (192GB) | BF16 | ~13GB | βœ… |
129
+
130
+ ### Google Colab (Free Tier)
131
+
132
+ ```python
133
+ # Run this in the first cell of a Colab notebook
134
+ !pip install -q transformers accelerate bitsandbytes safetensors sentencepiece
135
+
136
+ import torch
137
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
138
+
139
+ model = AutoModelForCausalLM.from_pretrained(
140
+ "0labs-in/Sky-v2.0-Lite",
141
+ trust_remote_code=True,
142
+ quantization_config=BitsAndBytesConfig(
143
+ load_in_4bit=True,
144
+ bnb_4bit_compute_dtype=torch.bfloat16,
145
+ bnb_4bit_quant_type="nf4",
146
+ ),
147
+ device_map="auto",
148
+ )
149
+ tokenizer = AutoTokenizer.from_pretrained("0labs-in/Sky-v2.0-Lite")
150
+
151
+ # You're ready to go!
152
+ ```
153
+
154
+ ---
155
+
156
+ ## Architecture: CREST K=2
157
+
158
+ ```
159
+ Input from Attention
160
+ ↓
161
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
162
+ β”‚ MLP₁ β”‚ ← Step 1: Core knowledge (always runs)
163
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
164
+ ↓
165
+ Halting Gate: "Is this token done?"
166
+ β”‚
167
+ β”œβ”€β”€ h₁ β‰ˆ 1.0 β†’ HALT (easy tokens: "the", "is", "and")
168
+ β”‚ output = state₁
169
+ β”‚
170
+ ↓ h₁ < threshold
171
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
172
+ β”‚ MLPβ‚‚ β”‚ ← Step 2: Deeper reasoning (only when needed)
173
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
174
+ ↓
175
+ Output = p₁ Β· state₁ + pβ‚‚ Β· stateβ‚‚
176
+ ```
177
+
178
+ **What this means in practice:**
179
+ - Simple/common tokens β†’ processed by just MLP₁ β†’ **fast**
180
+ - Complex/rare tokens β†’ processed by MLP₁ then MLPβ‚‚ β†’ **more accurate**
181
+ - The model **automatically decides** which path to take for each token
182
+
183
+ ---
184
+
185
+ ## Comparison: Lite vs Other Variants
186
+
187
+ | Feature | Sky v2.0-11B | Sky v2.0-6.5B | Sky v2.0-11B-INT4 | **Sky v2.0-Lite** |
188
+ |---|---|---|---|---|
189
+ | Parameters | 11.2B | 6.47B | 11.2B | **6.47B** |
190
+ | CREST Steps | 4 | 2 | 4 | **2** |
191
+ | Min VRAM (INT4) | ~6GB | ~4GB | ~6GB | **~4GB** |
192
+ | Adaptive Depth | βœ… | βœ… | βœ… | **βœ…** |
193
+ | Best For | Max quality | Balance | Full CREST | **Everyone** |
194
+
195
+ ---
196
+
197
+ ## Model Family
198
+
199
+ | Model | Parameters | CREST K | Best For |
200
+ |---|---|---|---|
201
+ | [Sky v2.0-11B](https://huggingface.co/0labs-in/Sky-v2.0-11B) | 11.2B | 4 | Maximum capability |
202
+ | [Sky v2.0-6.5B](https://huggingface.co/0labs-in/Sky-v2.0-6.5B) | 6.47B | 2 | Balanced performance |
203
+ | [Sky v2.0-11B-INT4](https://huggingface.co/0labs-in/Sky-v2.0-11B-INT4) | 11.2B | 4 | Full CREST, memory-efficient |
204
+ | **[Sky v2.0-Lite](https://huggingface.co/0labs-in/Sky-v2.0-Lite)** | **6.47B** | **2** | **Laptops, Colab, edge devices** |
205
+
206
+ ---
207
+
208
+ ## Citation
209
+
210
+ ```bibtex
211
+ @article{jadav2026crest,
212
+ title={CREST: Cognitively Recurrent Estimation of Step Termination for Adaptive-Depth Language Modeling},
213
+ author={Jadav, Atharvsinh},
214
+ year={2026},
215
+ url={https://huggingface.co/0labs-in/Sky-v2.0-11B}
216
+ }
217
+ ```
218
+
219
+ ---
220
+
221
+ ## About 0labs
222
+
223
+ **0labs** is an independent AI research lab founded by Atharvsinh Jadav in Gujarat, India. We build adaptive-depth LLMs that think harder on hard problems β€” trained on a single GPU.
224
+
225
+ - 🌐 Website: [0labs.in](https://0labs.in)
226
+ - πŸ€— HuggingFace: [huggingface.co/0labs-in](https://huggingface.co/0labs-in)
227
+
228
+ ---
229
 
230
+ <p align="center">
231
+ <em>AI should be accessible to everyone. Sky v2.0-Lite makes adaptive-depth intelligence available on any hardware.</em>
232
+ </p>