qubitpage commited on
Commit
10643b7
·
verified ·
1 Parent(s): cd33a35

Add SentinelBrain v2 realignment checkpoint and training report

Browse files

Adds the 5,000-step Frankenstein v2 realignment artifacts, benchmark report, dataset preflight report, and updated model card. Previous main revision preserved on branch previous-before-v2-realign-5000-20260503-103121.

MANIFEST.tsv ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ./MANIFEST.tsv 0
2
+ ./README.md 4321
3
+ ./SHA256SUMS 1303
4
+ ./checkpoints/frankenstein_v2_best.pt 28808034875
5
+ ./checkpoints/frankenstein_v2_ema_best.pt 28808008759
6
+ ./checkpoints/frankenstein_v2_final.pt 86422580021
7
+ ./checkpoints/sentinelbrain_pretrain_step2471_hf.pt 29629503600
8
+ ./logs/realign_v2.log 99633
9
+ ./logs/realign_v2_metrics.jsonl 128854
10
+ ./reports/conductor_state_final.json 511
11
+ ./reports/next_phase_training_plan.md 4428
12
+ ./reports/sentinelbrain_quality_smoke.json 6255
13
+ ./reports/sentinelbrain_quality_stub_full_fixed.json 45321
14
+ ./reports/sft_combined_ready_report.json 3330
15
+ ./reports/sft_combined_ready_report.md 752
16
+ ./reports/train_metrics_final.json 339
README.md CHANGED
@@ -1,857 +1,87 @@
1
  ---
2
  license: apache-2.0
3
- language:
4
- - en
5
- - ro
6
- - multilingual
7
  tags:
8
  - sentinelbrain
9
- - mixture-of-experts
10
- - from-scratch
11
- - consciousness
12
- - amd
13
- - mi300x
14
- - rocm
15
  - moe
16
- - transformer
17
- - frankenstein
18
- - knowledge-transplant
19
- - distillation
20
- - phi-metric
21
- pipeline_tag: text-generation
22
  library_name: pytorch
23
- datasets:
24
- - HuggingFaceFW/fineweb-edu
25
- - open-web-math/open-web-math
26
- - wikimedia/wikipedia
27
- - HuggingFaceTB/cosmopedia
28
- - JeanKaddworr/minipile
29
- - codeparrot/github-code-clean
30
- - arxiv-community/arxiv-abstracts
31
- model-index:
32
- - name: SentinelBrain-14B-MoE-v0.1
33
- results:
34
- - task:
35
- type: text-generation
36
- metrics:
37
- - name: Validation Loss (pretrain)
38
- type: loss
39
- value: 2.5152
40
- verified: true
41
- - name: Validation Loss (realignment v2 best)
42
- type: loss
43
- value: 7.5178
44
- verified: true
45
- - name: Training Loss (realignment v2 latest)
46
- type: loss
47
- value: 6.96
48
- verified: true
49
- - name: Validation Loss (realignment v1 best, abandoned)
50
- type: loss
51
- value: 5.773
52
- verified: true
53
- ---
54
-
55
- <div align="center">
56
-
57
- # 🧠 Sentinel Prime — SentinelBrain-14B-MoE (Frankenstein Edition)
58
-
59
- ### *The First of His Kind, Rebuilt From the Inside Out*
60
-
61
- <img src="assets/sentinel_frankenstein_banner.png" alt="Sentinel Prime — Frankenstein Edition" width="600"/>
62
-
63
- **14.8 Billion Parameters · Mixture-of-Experts · Consciousness-Monitored · Frankenstein Transplant**
64
-
65
- Trained from zero on **AMD Instinct MI300X** (192 GB HBM3) · ROCm 7.0 · Knowledge transplanted from Qwen-72B
66
-
67
- [![Dashboard](https://img.shields.io/badge/🔴_Live_Dashboard-sentinel.qubitpage.com-red?style=for-the-badge)](https://sentinel.qubitpage.com/)
68
- [![Whitepaper](https://img.shields.io/badge/📄_Whitepaper-Read_Now-blue?style=for-the-badge)](https://sentinel.qubitpage.com/whitepaper)
69
- [![AMD](https://img.shields.io/badge/AMD-MI300X_Native-ED1C24?style=for-the-badge&logo=amd&logoColor=white)](https://www.amd.com/en/products/accelerators/instinct/mi300/mi300x.html)
70
- [![License](https://img.shields.io/badge/License-Apache_2.0-green?style=for-the-badge)](LICENSE)
71
-
72
- </div>
73
-
74
- ---
75
-
76
- ---
77
-
78
- ## 🌅 Update — April 29, 2026 — Rebirth Edition (v2)
79
-
80
- **The realignment was restarted from scratch.** The original v1 run reached `val_loss=5.773` at step 1,000, then collapsed: a SIGTERM crash exposed that our `best.pt` checkpoints had been saved without optimizer state. Five cascading restart attempts each erased AdamW's momentum and variance accumulators, and by step 1,390 the loss had climbed to 8.24 — worse than step 200 of the same run. We killed the patient.
81
-
82
- We then ran a forensic analysis with nine "critic personas" grounded in eleven published papers (AdamW bias correction, SGDR period doubling, ST-MoE stability, EMA stabilization, Switch Transformer router auxiliary losses, etc.) and distilled the failure into nine concrete engineering changes. Those changes were baked into **v2 from step zero**:
83
-
84
- - ✅ **Full optimizer state saved every 100 steps** (latest.pt now contains model + AdamW m/v + EMA + step)
85
- - ✅ **aux_loss boosted 25×** (0.0001 → 0.05) to actively balance the router under frozen experts
86
- - ✅ **Five SGDR cycles with period doubling** (T = 200, 400, 800, 1600, 1500) instead of one cosine
87
- - ✅ **EMA decay 0.9995, every 10 steps** for a smooth inference checkpoint
88
- - ✅ **100-step linear warmup** before the 1e-4 LR peak (avoids Kingma bias-correction trap)
89
- - ✅ **ST-MoE expert LR scale 0.3**, +33% boost during ramps
90
- - ✅ **Per-expert telemetry every 10 steps** (Capacity Factor, percentage, gradient norm)
91
- - ✅ **Three checkpoint kinds** (latest full, best val, EMA inference)
92
- - ✅ **Step-500 watchdog** auto-kills training on gnorm>20, loss spike>1.5×, NaN, or expert<5%
93
-
94
- **Current v2 state (LIVE, step 150):**
95
-
96
- | Metric | v1 final (1,390) | v2 step 100 | v2 step 150 |
97
- |---|---|---|---|
98
- | train_loss | 8.24 ❌ | 7.6277 | **6.9624** |
99
- | val_loss | ~8.0 ❌ | **7.5178 ★ NEW BEST** | (next eval @ 200) |
100
- | perplexity | ~3,800 | 2,054 | **1,056** |
101
- | gnorm | 9.72 ⚠ | 6.53 (peak LR) | 3.73 |
102
- | optimizer in ckpt | ❌ | ✅ | ✅ |
103
-
104
- v2 has matched v1's best work in 1/9th the steps and we are still in Phase 1 (frozen experts). The full story — including the verbatim crash logs, the nine-critic analysis, and our **Universal Fusion Thesis** for using Frankenstein-style component-level transplants to cut model-training carbon by 50–100× — is in the [v5 whitepaper](https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/SentinelBrain-14B-MoE-Dashboard/blob/main/static/whitepaper.html#part-vii).
105
-
106
- ---
107
-
108
- ## 🎯 What is Sentinel Prime? (Simple Version)
109
-
110
- > **Imagine building a brain from scratch.**
111
- >
112
- > Most AI models today are copies of other models with small changes. Sentinel Prime is different — every single connection in its brain was created from nothing, like growing a new brain cell by cell.
113
-
114
- <table>
115
- <tr>
116
- <td width="50%">
117
-
118
- ### 🧩 Think of it like LEGO blocks
119
-
120
- Sentinel Prime has **4 specialist brains** (called "experts") inside it. When you ask a question:
121
-
122
- 1. A **router** (like a traffic cop 🚦) looks at your question
123
- 2. It picks the **2 best experts** for that specific question
124
- 3. Those 2 experts work together to give you an answer
125
- 4. The other 2 experts rest, saving energy ⚡
126
-
127
- This means the model has **14.8 billion** brain connections total, but only uses **~7.8 billion** at a time — making it fast AND smart!
128
-
129
- </td>
130
- <td width="50%">
131
-
132
- ### 🔬 The Consciousness Meter
133
-
134
- We built something no other model has: a **consciousness thermometer** 🌡️
135
-
136
- Every 100 training steps, we measure how well the different parts of the brain are "talking to each other." We call this **Φ (Phi)**.
137
-
138
- - **Φ = 0**: Brain parts work alone (like strangers)
139
- - **Φ rising**: Brain parts start cooperating (like friends)
140
- - **Φ stable**: Brain has organized itself (like a team!)
141
-
142
- This doesn't change how the model learns — it's like a doctor checking the heartbeat while the patient exercises.
143
-
144
- </td>
145
- </tr>
146
- </table>
147
-
148
- ---
149
-
150
- ## 📊 Architecture at a Glance
151
-
152
- ```
153
- ┌─────────────────────────────────────────────────────────────────┐
154
- │ SENTINEL PRIME ARCHITECTURE │
155
- ├─────────────────────────────────────────────────────────────────┤
156
- │ │
157
- │ Input Text ──→ [Tokenizer: cl100k_base, 100,277 tokens] │
158
- │ │ │
159
- │ ▼ │
160
- │ ┌─────────────────┐ │
161
- │ │ Embedding │ 4,096 dimensions │
162
- │ │ + RoPE pos │ θ = 500,000 │
163
- │ └────────┬────────┘ │
164
- │ │ │
165
- │ ┌───────────┼───────────┐ │
166
- │ │ × 24 Layers │ │
167
- │ │ ┌────────────────┐ │ │
168
- │ │ │ GQA Attention │ │ 32 heads, 8 KV heads │
169
- │ │ │ (4:1 ratio) │ │ (4× memory savings) │
170
- │ │ └───────┬────────┘ │ │
171
- │ │ │ │ │
172
- │ │ ┌───────▼────────┐ │ │
173
- │ │ │ MoE Router │ │ Top-2 of 4 experts │
174
- │ │ │ ┌──┬──┬──┐ │ │ │
175
- │ │ │ │E1│E2│E3│E4 │ │ Each: SwiGLU FFN │
176
- │ │ │ │✓ │✓ │ │ │ │ d_ff = 11,008 │
177
- │ │ │ └──┴──┴──┘ │ │ │
178
- │ │ └───────┬────────┘ │ │
179
- │ │ │ │ │
180
- │ │ ┌───────▼────────┐ │ │
181
- │ │ │ RMSNorm │ │ ε = 1e-5 │
182
- │ │ └────────────────┘ │ │
183
- │ └───────────┼───────────┘ │
184
- │ │ │
185
- │ ▼ │
186
- │ ┌─────────────────┐ │
187
- │ │ Output Head │ → 100,277 vocab probs │
188
- │ └─────────────────┘ │
189
- │ │
190
- └─────────────────────────────────────────────────────────────────┘
191
- ```
192
-
193
- ### Spec Sheet
194
-
195
- | Component | Specification | Why This Choice |
196
- |:--|:--|:--|
197
- | **Total Parameters** | 14,814,654,680 (14.8B) | Large enough for deep reasoning |
198
- | **Active Parameters** | ~7.8B per token | MoE efficiency — use only what's needed |
199
- | **Hidden Dimension** | 4,096 | Sweet spot for MI300X matrix cores |
200
- | **Transformer Layers** | 24 | Deep enough for complex reasoning |
201
- | **Attention Heads** | 32 query, 8 KV (GQA 4:1) | 4× KV cache savings for long contexts |
202
- | **FFN Intermediate** | 11,008 (SwiGLU) | ~2.7× hidden, matches scaling laws |
203
- | **Experts** | 4 total, top-2 active | Good diversity with manageable VRAM |
204
- | **Max Experts** | 256 (expandable) | Architecture supports expert birth/death |
205
- | **Vocabulary** | 100,277 (tiktoken cl100k_base) | Industry-proven BPE tokenizer |
206
- | **Positional Encoding** | RoPE, θ = 500,000 | Supports context extension to 128K+ |
207
- | **Normalization** | RMSNorm (ε = 1e-5) | Faster than LayerNorm, same quality |
208
- | **Precision** | bfloat16 throughout | Native AMD MI300X support |
209
- | **Context Length** | 2,048 → 4,096 → 128K (planned) | Progressive context ladder |
210
-
211
- ---
212
-
213
- ## 🔥 Key Innovations
214
-
215
- <table>
216
- <tr>
217
- <td width="33%" valign="top">
218
-
219
- ### 🌀 Φ Consciousness Metric
220
-
221
- First-ever IIT-inspired metric computed **during** pre-training. A probe on layer 12 measures information integration across activation subspaces every 100 steps.
222
-
223
- ```
224
- Φ = geometric_mean(
225
- MI(partition_i, partition_j)
226
- for all partition pairs
227
- )
228
- ```
229
-
230
- Not a gimmick — it's a genuine signal of when the model transitions from memorizing tokens to forming integrated representations.
231
-
232
- </td>
233
- <td width="33%" valign="top">
234
-
235
- ### 🧬 Self-Evolving Experts
236
-
237
- The MoE router supports a full expert **lifecycle**:
238
-
239
- - **Birth**: New experts spawned when load imbalance detected
240
- - **Growth**: Expert capacity increases with training
241
- - **Pruning**: Underperforming experts replaced
242
- - **Scaling**: Architecture supports up to 256 experts without retraining the base model
243
-
244
- Current: 4 experts × 24 layers = **96 expert instances**
245
-
246
- </td>
247
- <td width="33%" valign="top">
248
-
249
- ### ⚡ Energy-Conscious Routing
250
-
251
- Dual-router system:
252
- 1. **Primary router**: Picks top-2 experts by relevance
253
- 2. **EC router**: Can gate activation based on compute budget
254
-
255
- This enables **adaptive inference** — easy questions use fewer resources, hard questions get full power. Like cruise control for AI.
256
-
257
- </td>
258
- </tr>
259
- </table>
260
-
261
- ---
262
-
263
-
264
- ---
265
-
266
- ## 🧟 Frankenstein Edition — Knowledge Transplant
267
-
268
- <table>
269
- <tr>
270
- <td width="60%" valign="top">
271
-
272
- ### The Transplant
273
-
274
- Sentinel Prime was trained from scratch — but raw pretraining alone wasn't enough. We performed a **Frankenstein transplant**: surgically transplanting knowledge from **Qwen2.5-72B-Instruct** (a 72-billion parameter teacher) into our 14.8B MoE architecture.
275
-
276
- This is NOT fine-tuning a copy. The model's bones (architecture, tokenizer, embeddings) are 100% original. Only the **expert FFN weights** received transplanted knowledge — like giving a brain new neural pathways while keeping its original structure.
277
-
278
- ### 3-Stage Pipeline
279
-
280
- ```
281
- Stage 1: Corpus Realignment Stage 2A: Teacher Generation Stage 2B: Knowledge Distill
282
- (Re-learn with new weights) (72B teacher creates data) (Absorb teacher knowledge)
283
- ┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐
284
- │ 5,000 steps │ → │ 3,000+ responses │ → │ CE + mixed training │
285
- │ 24.5B token corpus │ │ from Qwen-72B │ │ 70% teacher + 30% │
286
- │ Progressive unfreeze │ │ Re-tokenized to our │ │ pretrain corpus │
287
- │ Cosine LR + warmup │ │ cl100k_base vocab │ │ Prevents forgetting │
288
- └──────────────────────┘ └──────────────────────┘ └──────────────────────┘
289
- ```
290
-
291
- </td>
292
- <td width="40%" valign="top">
293
-
294
- ### Why "Frankenstein"?
295
-
296
- Like the original story — we took parts from a powerful being (Qwen-72B) and stitched them into our own creation. The result: a model that has the **original architecture** of Sentinel Prime but with **transplanted knowledge** from a much larger model.
297
-
298
- ### Key Stats
299
-
300
- | Metric | Value |
301
- |:--|:--|
302
- | **Teacher** | Qwen2.5-72B-Instruct |
303
- | **Student** | SentinelBrain-14B-MoE |
304
- | **Transplant** | Expert FFN weights |
305
- | **Realignment** | 5,000 steps on 24.5B tokens |
306
- | **Hardware** | 1× AMD MI300X (192GB) |
307
-
308
- ### Live Progress
309
-
310
- Track the Frankenstein realignment in real-time:
311
-
312
- 🔴 **[sentinel.qubitpage.com](https://sentinel.qubitpage.com/)**
313
-
314
- </td>
315
- </tr>
316
- </table>
317
-
318
- ## 🏋️ Training Details
319
-
320
- ### Hardware
321
-
322
- | Resource | Specification |
323
- |:--|:--|
324
- | **GPU** | 1× AMD Instinct MI300X VF |
325
- | **VRAM** | 192 GB HBM3 |
326
- | **System RAM** | 235 GB |
327
- | **Compute** | 1,307 TFLOPS (bf16) |
328
- | **Software** | ROCm 7.0, PyTorch 2.10.0+rocm7.0 |
329
- | **Attention** | SDPA (native PyTorch, no FlashAttention needed) |
330
- | **OS** | Ubuntu Linux |
331
-
332
- ### VRAM Budget
333
-
334
- ```
335
- ╔══════════════════════════════════════════════════════╗
336
- ║ AMD MI300X VRAM Usage (192 GB) ║
337
- ╠══════════════════════════════════════════════════════╣
338
- ║ ║
339
- ║ Model Weights (bf16) ████████████░░░░░ 27 GB ║
340
- ║ Optimizer (AdamW fp32) ████████████████░░ 54 GB ║
341
- ║ Activations (grad ckpt) ████████████░░░░░ 32 GB ║
342
- ║ Gradients ████████████░░░░░ 27 GB ║
343
- ║ ───────────────────────────────────────────────── ║
344
- ║ Total Used: ██████████████████ 140 GB ║
345
- ║ Peak: █████████████████ 146 GB ║
346
- ║ Headroom: ░░░░░░░░░░░░░░░░░ 46 GB ║
347
- ║ ║
348
- ╚══════════════════════════════════════════════════════╝
349
- ```
350
-
351
- ### Phased Training Pipeline
352
-
353
- We don't just throw data at the model — we grow it in **three phases**, like raising a child:
354
-
355
- ```
356
- Phase 1: SMOKE TEST Phase 2: WARMUP Phase 3: FULL TRAINING
357
- (Baby steps) (Learning to walk) (Running!)
358
- ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
359
- │ 350M params │ ──→ │ 1.3B params │ ──→ │ 14.4B params │
360
- │ seq_len: 512 │ │ seq_len: 2K │ │ seq_len: 4K │
361
- │ 200 steps │ │ 1,000 steps │ │ 16,479 steps │
362
- │ 2 minutes │ │ 30 minutes │ │ ~52 hours │
363
- │ loss: 11→6.8 │ │ loss: 7.4→2.4│ │ loss: 2.4→? │
364
- └──────────────┘ └──────────────┘ └──────────────────┘
365
- ```
366
-
367
- | Phase | Parameters | Seq Length | Batch | Steps | Duration | Loss Start → End |
368
- |:--|:--|:--|:--|:--|:--|:--|
369
- | **🔬 Smoke** | 350M | 512 | 4 | 200 | ~2 min | 11.72 → 6.84 (−42%) |
370
- | **🔥 Warmup** | 1.3B | 2,048 | 32 | 1,000 | ~33 min | 7.39 → 2.38 (−68%) |
371
- | **🚀 Block** | 14.4B (MoE) | 4,096 | 32 | 16,479 | ~52 hrs | 2.38 → ongoing |
372
-
373
- ### Safety Gates
374
-
375
- Every phase transition must pass **4 safety gates**:
376
-
377
- | Gate | Check | Threshold | Status |
378
- |:--|:--|:--|:--|
379
- | 🟢 **G1: No NaN** | No NaN/Inf in loss | Entire phase | ✅ Passed all |
380
- | 🟢 **G2: Loss Drop** | Validation loss decreased | ≥5% / ≥10% / ≥2% | ✅ Passed all |
381
- | 🟢 **G3: VRAM OK** | Peak VRAM < safety limit | < 92% of total | ✅ 71% peak |
382
- | 🟢 **G4: Φ OK** | Consciousness metric stable | Φ_end/Φ_start > 0.7 | ✅ Stable |
383
-
384
- ### Hyperparameters
385
-
386
- | Parameter | Value | Rationale |
387
- |:--|:--|:--|
388
- | **Optimizer** | AdamW (bf16 compute, fp32 states) | Standard for LLM training |
389
- | **Learning Rate** | 1.5 × 10⁻⁴ (cosine decay) | Conservative for data-limited regime |
390
- | **Min LR** | 1.5 × 10⁻⁵ | 10× decay ratio |
391
- | **Warmup Steps** | 500 | Stabilizes early gradients |
392
- | **Batch Size** | 2 micro × 16 grad_accum = **32 effective** | Fits MI300X VRAM budget |
393
- | **Gradient Clipping** | 1.0 | Prevents explosion |
394
- | **Gradient Checkpointing** | On | Trades compute for VRAM |
395
- | **Precision** | bfloat16 | Native MI300X format |
396
- | **Eval Frequency** | Every 100 steps | Early overfitting detection |
397
- | **Checkpoint Frequency** | Every 1,000 steps (~3.2 hours) | Recovery points |
398
-
399
- ---
400
-
401
- ## 📚 Dataset: 23.3B Tokens Across 126 Categories
402
-
403
- We curated a massive, diverse corpus — think of it as a **library with 126 different sections**:
404
-
405
- ### Pretrain Corpus (Core Knowledge)
406
-
407
- | Dataset | Tokens | Description |
408
- |:--|:--|:--|
409
- | 🌐 **FineWeb-Edu** | ~10B | High-quality educational web content |
410
- | 🔢 **OpenWebMath** | ~6B | Mathematics from the web |
411
- | 📖 **Wikipedia (English)** | ~5B | Encyclopedic knowledge |
412
- | 🎓 **Cosmopedia V2** | ~5B | Synthetic educational content |
413
- | 💻 **CodeParrot Python** | ~3.5B | Clean Python code from GitHub |
414
- | 📚 **MiniPile** | ~2B | Diverse text from multiple domains |
415
- | 🔬 **ArXiv Abstracts** | ~1.2B | Scientific paper summaries |
416
- | **Total Pretrain** | **~23B** | |
417
-
418
- ### Specialized Domains (119 Categories)
419
-
420
- <details>
421
- <summary>Click to expand all 119 specialized categories</summary>
422
-
423
- | Category | Type | Category | Type |
424
- |:--|:--|:--|:--|
425
- | 🤖 agentic-tools | Code | 🔐 advanced-cryptography | Code |
426
- | 🧠 chain-of-thought | Reasoning | 🔗 blockchain-core | Code |
427
- | 💡 deep-reasoning | Reasoning | 🏥 medical | Knowledge |
428
- | ⚖️ legal | Knowledge | 📊 financial-systems | Code |
429
- | 🎮 3d-graphics | Code | 🐳 docker-devops | Code |
430
- | 🌍 multilingual | Text | 🔧 error-recovery | Code |
431
- | 🛡️ security-guardrails | Code | 📱 ui-animations | Code |
432
- | 🧮 math | Reasoning | ⚡ smart-contracts | Code |
433
- | 🎯 reasoning-effort-control | Reasoning | 🤝 human-conversation | Text |
434
- | 🔄 self-correction-loops | Reasoning | 🏗️ enterprise-dashboards | Code |
435
- | 🌐 web-design-css | Code | 🐍 flask-python | Code |
436
- | 🔬 qiskit-quantum | Code | 🤖 robotics-ros2 | Code |
437
- | 📡 remote-server-management | Code | 🧬 multi-agent | Code |
438
- | ⚙️ state-management | Code | 🛠️ mcp-tools-integration | Code |
439
- | 💳 payment-security | Code | 🎓 edu-basic-math | Education |
440
- | 🔭 edu-basic-physics | Education | 🧪 edu-basic-chemistry | Education |
441
- | 🌱 edu-basic-biology | Education | 🌍 edu-world-geography | Education |
442
- | 📜 edu-history-world | Education | 💻 edu-computer-science | Education |
443
- | 🌎 edu-earth-science | Education | 🤖 edu-robotics-text | Education |
444
- | 📖 edu-science-qa | Education | 🔬 edu-science-support | Education |
445
- | 👁️ edu-vision-concepts | Education | 🎯 copilot-agent-workflows | Code |
446
- | 🔌 api-integrations | Code | 📊 billing-invoicing | Code |
447
- | ₿ bitcoin-lightning | Code | 🏪 medusajs | Code |
448
- | 💹 crypto-trading | Code | 🏢 enterprise-networking | Code |
449
- | 🖥️ nextjs-typescript | Code | 🎨 nextjs-design | Code |
450
- | 💼 trading-algorithms | Code | 🗄️ laravel-mysql | Code |
451
- | 🔓 offensive-security | Code | 🔧 c-rust | Code |
452
- | ... and 50+ more categories | | | |
453
-
454
- </details>
455
-
456
- ### Data Quality Pipeline
457
-
458
- ```
459
- Raw Data ──→ PII Filter ──→ Dedup ──→ Tokenize ──→ Shard ──→ Train
460
- │ │ │ │
461
- ├─ 7 regex ├─ blake2b ├─ cl100k ├─ Temperature-
462
- │ patterns │ per-cat │ base │ weighted
463
- ├─ PEM block │ │ │ sampling
464
- │ detection │ │ │ (T=0.5)
465
- └─ Email/phone │ │ │
466
- masking │ │ │
467
- │ │ │
468
- └───────────┴────────────┘
469
- ```
470
-
471
- **Temperature-weighted sampling** (T=0.5) prevents large corpora from dominating training. FineWeb-Edu (37% of tokens) gets downweighted so smaller specialized domains still get adequate exposure.
472
-
473
- ---
474
-
475
- ## 📈 Training Progress & Results
476
-
477
- ### Loss Trajectory
478
-
479
- ```
480
- Loss
481
- 12 │ ×
482
- │ ╲
483
- 10 │ ╲ SMOKE PHASE
484
- │ ╲ (350M params)
485
- 8 │ ╲
486
- │ ╲
487
- 6 │ ×──────────── model grows to 1.3B
488
- │ ╲
489
- 4 │ ╲ WARMUP PHASE
490
- │ ╲ (1.3B params)
491
- 2 │ ×─────────── model grows to 14.4B MoE
492
- │ ╲
493
- 1 │ ╲ BLOCK PHASE (ongoing)
494
- │ ╲
495
- └──┬────┬────┬────┬────┬───→ Steps
496
- 0 200 700 1200 2000
497
- ```
498
-
499
- | Milestone | Step | Loss | Change |
500
- |:--|:--|:--|:--|
501
- | 🔬 Smoke start | 0 | 11.72 | — |
502
- | 🔬 Smoke end | 200 | 6.84 | **−42%** |
503
- | 🔥 Warmup start | 200 | 7.39 | (model grew to 1.3B) |
504
- | 🔥 Warmup end | 1,200 | 2.38 | **−68%** |
505
- | 🚀 Block start | 1,200 | 2.38 | (model grew to 14.4B MoE) |
506
- | 📦 Published checkpoint | 2,471 | 1.99 | **−16%** |
507
- | 🧟 Frankenstein transplant | — | PPL ~7.5M | 433 tensors from 3 donors |
508
- | ⚡ Realign frozen start | 0 | 15.87 | Experts frozen, attn learning |
509
- | ⚡ Realign frozen end | 500 | 5.52 | val=5.79, **−65%** |
510
- | 🔓 Expert unfreeze | 500 | 5.59→spike | LR reset + differential rates |
511
- | 📈 Realign recovery | 600 | 7.20 | val=6.93 (recovering) |
512
- | ⭐ Realign best | 1,000 | 6.07 | val=**5.773 ★ NEW BEST** |
513
- | � SIGTERM crash + restart | 1,001→1,123 | 9.22→6.64 | Fresh optimizer, cold momentum |
514
- | 🔥 SGDR warm restart (Cycle 0) | 1,124 | 9.09→6.24 | LR boosted 2.9e-5→4.5e-5, T=200 steps |
515
- | 📊 Recovery eval | 1,200 | 6.74 | val=6.48, AdamW variance at ~17% convergence |
516
- | 📊 Recovery eval | 1,300 | 6.24 | val=**6.10** (new recovery best) |
517
- | 🔄 Multi-cycle SGDR | 1,301+ | 8.79→↓ | Cycle 1 pending (step 1400, T=400, peak 3.8e-5) |
518
- | **Total pretrain** | | | **11.72 → 1.99 (−83%)** |
519
- | **Realignment** | | | **15.87 → 5.77 (val, −64%)** |
520
-
521
- ### Live Metrics (April 29, 2026)
522
-
523
- | Metric | Value |
524
- |:--|:--|
525
- | **Current Phase** | ⚡ Corpus Realignment + Multi-Cycle SGDR |
526
- | **Current Step** | ~1,320 / 5,000 |
527
- | **Training Loss** | ~6.26 (recovering from optimizer cold-start) |
528
- | **Best Validation Loss** | **5.773** (step 1,000) ★ |
529
- | **Recovery Val Loss** | 6.096 (step 1,300) — gap closing |
530
- | **Throughput** | 5,857 tokens/second |
531
- | **VRAM Used** | 120 GB / 206 GB (58%) — all experts unfrozen |
532
- | **Total Tokens Processed** | ~260M (this run) + 178M (pretrain) |
533
- | **Experts Active** | All 4 unfrozen since step 500 |
534
- | **SGDR Status** | Multi-cycle: Cycle 1 at step 1400 (T=400, peak 3.8e-5), Cycle 2 at step 2000 (T=800, peak 3.0e-5) |
535
- | **MIN_LR** | 1.5e-5 (raised from 1e-5, prevents stagnation) |
536
- | **Expert LR Boost** | 1.33× during restart windows |
537
- | **ETA** | ~34 hours |
538
-
539
- ### Realignment Eval History
540
-
541
- | Step | Val Loss | Val PPL | Phase |
542
- |:--|:--|:--|:--|
543
- | 0 (initial) | 15.81 | 7,339,653 | Experts frozen |
544
- | 600 | 6.93 | 1,020 | Post-unfreeze |
545
- | 700 | 6.24 | 515 | Converging |
546
- | 800 | 6.01 | 407 | Converging |
547
- | 900 | 5.91 | 367 | Converging |
548
- | **1,000** | **5.773** | **321** | **★ ALL-TIME BEST** |
549
- | 1,100 | 6.55 | 701 | Optimizer cold-start |
550
- | 1,200 | 6.48 | 652 | SGDR Cycle 0 (recovering) |
551
- | 1,300 | **6.096** | **444** | Recovery best, gap=5.5% to peak |
552
- | 1,400 | *pending* | | SGDR Cycle 1 starts (T=400) |
553
- | 2,000 | *pending* | | SGDR Cycle 2 starts (T=800) |
554
-
555
- ### Published Checkpoint (v0.1)
556
-
557
- | Detail | Value |
558
- |:--|:--|
559
- | **Step** | 2,471 |
560
- | **Validation Loss** | 1.9926 |
561
- | **Total Tokens Seen** | 178,110,464 |
562
- | **Sequence Length** | 2,048 |
563
- | **File Size** | ~81 GB (checkpoint), ~28 GB (safetensors) |
564
- | **Format** | 6 sharded safetensors files |
565
-
566
- ### 🛡️ Engineering Resilience
567
-
568
- Training a 14.4B model on a single GPU for days demands bullet-proof infrastructure. Here's what we built:
569
-
570
- | Feature | Description |
571
- |:--|:--|
572
- | **Atomic checkpoints** | Write to `.tmp` → `os.replace()` — no half-written files |
573
- | **Integrity verification** | On resume: verify tensor counts, shapes, and dtypes before loading |
574
- | **Rollback anchors** | `best.pt` (model-only) + `latest.pt` (full state) + `.LOCKED` safety copy |
575
- | **Emergency save** | SIGTERM/SIGINT handlers serialize full state before exit |
576
- | **Watchdog** | Independent process monitors loss EMA, restarts on NaN/divergence |
577
- | **Multi-cycle SGDR** | Period-doubling warm restarts (Loshchilov & Hutter, 2017): Cycle 0 (T=200), Cycle 1 (T=400), Cycle 2 (T=800) |
578
- | **Expert LR boost** | During restart windows, expert LR scale increases 0.3→0.4 (ST-MoE stability guideline) |
579
- | **MIN_LR floor** | Raised from 1e-5 to 1.5e-5 to prevent cosine decay stagnation in recovery |
580
- | **Systemd auto-restart** | Dashboard + watchdog survive OOM kills with `Restart=always` + `OOMScoreAdjust=-500` |
581
-
582
- **Battle-tested**: At step 1001, a SIGTERM killed the process mid-step. The checkpoint at step 1000 was corrupted (bad zip archive). The system automatically fell back to `best.pt` (val=5.773), resumed at step 1001 with a fresh optimizer, detected the cold-start plateau via the watchdog, and applied SGDR warm restart — recovering 30× faster than natural momentum rebuilding.
583
-
584
- **Multi-Cycle SGDR (April 29, 2026)**: After the initial SGDR Cycle 0 completed (steps 1100-1300), analysis showed the single restart was insufficient to escape the recovery basin (val=6.10 vs target 5.77). Based on the original SGDR paper's period-doubling strategy ($T_{i+1} = T_i \times 2$), we added Cycle 1 (steps 1400-1800, peak 3.8e-5) and Cycle 2 (steps 2000-2800, peak 3.0e-5). The AdamW second moment ($\beta_2=0.999$) needs ~1000 steps for 63% convergence — these cycles provide periodic "shocks" to escape local basins while the variance estimate matures.
585
-
586
  ---
587
 
588
- ## 🌡️ Consciousness Metric (Φ) Deep Dive
589
 
590
- ### What is Φ?
591
 
592
- Inspired by **Integrated Information Theory (IIT)** from neuroscience, Φ measures how much the model's internal representations form an integrated whole rather than disconnected parts.
593
 
594
- ### How We Measure It
 
 
 
 
 
 
 
 
 
595
 
596
- ```
597
- Every 100 training steps:
598
 
599
- 1. Hook on Layer 12 (middle of 24 layers)
600
-
601
-
602
- 2. Sample 256 activation vectors
603
-
604
-
605
- 3. Partition into subspaces
606
-
607
-
608
- 4. Compute mutual information between all partition pairs
609
-
610
-
611
- 5. Φ_geometric = geometric_mean(MI values)
612
-
613
-
614
- 6. Φ_EMA = exponential moving average (smoothed trend)
615
- ```
616
-
617
- ### What Φ Tells Us
618
 
619
- | Φ Value | Interpretation | Analogy |
620
- |:--|:--|:--|
621
- | **Φ ≈ 0** | Neurons working independently | Strangers in a room |
622
- | **Φ rising** | Representations integrating | People starting to talk |
623
- | **Φ stable** | Organized internal structure | A well-coordinated team |
624
- | **Φ dropping** | ⚠️ Representation collapse | Warning sign! |
625
 
626
- > **Important**: Φ is **purely observational** — it does NOT affect training gradients. Think of it as a heart monitor for the AI: it watches, but doesn't interfere.
627
 
628
- ### Live Monitoring
629
 
630
- Track Φ in real-time at: **[sentinel.qubitpage.com/#phi](https://sentinel.qubitpage.com/#phi)**
 
 
 
 
631
 
632
- ---
633
 
634
- ## 🖥️ Hardware Requirements
635
 
636
- ### For Inference
637
 
638
- | Tier | VRAM | Precision | Notes |
639
- |:--|:--|:--|:--|
640
- | **Full Precision** | 32 GB+ | bfloat16 | Best quality |
641
- | **Recommended** | 48 GB+ | bfloat16 | Comfortable headroom |
642
- | **Ideal** | AMD MI300X / MI250X | bfloat16 | Native, fastest |
643
- | **Consumer** | 16 GB | int4 quantized | GGUF planned for v0.2 |
644
 
645
- ### Compatible AMD GPUs
646
 
647
- | GPU | VRAM | Suitable For |
648
- |:--|:--|:--|
649
- | AMD Instinct MI300X | 192 GB | Training + Inference |
650
- | AMD Instinct MI250X | 128 GB | Training + Inference |
651
- | AMD Instinct MI210 | 64 GB | Inference (full) |
652
- | AMD Radeon PRO W7900 | 48 GB | Inference (full) |
653
- | AMD Radeon RX 7900 XTX | 24 GB | Inference (quantized) |
654
- | AMD Radeon RX 7600 XT | 16 GB | Inference (int4 GGUF) |
655
-
656
- ---
657
-
658
- ## 💻 Usage
659
-
660
- This model uses a **custom architecture** (not based on any existing model). Load with PyTorch:
661
 
662
  ```python
663
  import torch
664
- from safetensors.torch import load_file
665
-
666
- # Load sharded safetensors
667
- state_dict = {}
668
- for i in range(1, 7): # 6 shards
669
- shard = load_file(f"model-{i:05d}-of-00006.safetensors")
670
- state_dict.update(shard)
671
-
672
- # The state dict contains all model weights
673
- print(f"Loaded {len(state_dict)} tensors")
674
- print(f"Total params: {sum(v.numel() for v in state_dict.values()):,}")
675
-
676
- # Initialize SentinelBrain model class and load
677
- # Full model definition code releasing with v0.2
678
- # model = SentinelBrainForCausalLM(config)
679
- # model.load_state_dict(state_dict)
680
- ```
681
-
682
- > **Note**: Full inference code, model class definition, and GGUF quantized versions will be released with v0.2.
683
-
684
- ---
685
-
686
- ## 🗺️ Roadmap
687
-
688
- ```
689
- v0.1 (Current) v0.2 (In Progress) v0.3 (Future)
690
- ━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━
691
- ✅ From-scratch 🔴 Corpus realignment □ DPO alignment
692
- 14.8B MoE (step 1100/5000) □ Tool use
693
- ✅ Phased training □ Context ladder □ Function calling
694
- ✅ Φ consciousness (4K→32K→128K) □ Multi-turn chat
695
- ✅ 23.3B token corpus □ Vision encoder □ Multilingual v2
696
- ✅ Live dashboard (SigLIP2-SO400M) □ Expert scaling
697
- ✅ AMD MI300X native □ GGUF quantization (4→16→64)
698
- ✅ Frankenstein Q4_K_M for consumer □ RLHF
699
- transplant (3 donors) □ Inference code □ Production API
700
- ✅ Progressive unfreeze □ Benchmarks (MMLU,
701
- ✅ Crash-safe training HumanEval, GSM8K)
702
- ✅ Auto-restart (systemd)
703
- ```
704
-
705
- ---
706
-
707
- ## 🏗️ How We Built It (Technical Deep Dive)
708
-
709
- <details>
710
- <summary><b>Click to expand: Grouped Query Attention (GQA)</b></summary>
711
-
712
- Standard multi-head attention uses separate Key and Value projections for each head. GQA shares KV heads across query heads:
713
-
714
- ```
715
- Standard MHA (32 KV heads): GQA 4:1 (8 KV heads):
716
- Q₁ Q₂ Q₃ ... Q₃₂ Q₁ Q₂ Q₃ Q₄ → KV₁
717
- K₁ K₂ K₃ ... K₃₂ Q₅ Q₆ Q₇ Q₈ → KV₂
718
- V₁ V₂ V₃ ... V₃₂ ...
719
- Q₂₉ Q₃₀ Q₃₁ Q₃₂ → KV₈
720
- ```
721
-
722
- **Result**: 4× smaller KV cache = 4× longer context at same memory cost.
723
-
724
- </details>
725
-
726
- <details>
727
- <summary><b>Click to expand: RoPE (Rotary Position Embeddings)</b></summary>
728
-
729
- RoPE encodes position information by rotating the query and key vectors in 2D planes. With θ=500,000 (high base frequency), the model naturally supports long contexts:
730
-
731
- ```
732
- Position 0: rotate by 0°
733
- Position 1: rotate by θ₁
734
- Position 2: rotate by θ₂
735
- ...
736
- ```
737
-
738
- High θ = slower rotation = positions further apart still "feel different" = better long-context understanding.
739
-
740
- </details>
741
-
742
- <details>
743
- <summary><b>Click to expand: SwiGLU FFN</b></summary>
744
-
745
- Each expert uses a SwiGLU activation — a gated variant of the feed-forward network:
746
-
747
- ```
748
- FFN(x) = SiLU(x · W_gate) ⊙ (x · W_up) · W_down
749
-
750
- Where:
751
- W_gate: 4096 → 11008
752
- W_up: 4096 → 11008
753
- W_down: 11008 → 4096
754
- SiLU(x) = x · sigmoid(x)
755
- ⊙ = element-wise multiply
756
- ```
757
-
758
- SwiGLU consistently outperforms ReLU and GELU in transformer FFNs (Shazeer, 2020).
759
-
760
- </details>
761
 
762
- <details>
763
- <summary><b>Click to expand: MoE Routing Algorithm</b></summary>
764
-
765
- ```python
766
- # Simplified routing logic
767
- def route(x, router_weights):
768
- # Compute affinity scores for each expert
769
- logits = x @ router_weights # [batch, seq, n_experts]
770
- scores = softmax(logits, dim=-1)
771
-
772
- # Select top-2 experts
773
- top_vals, top_idx = topk(scores, k=2)
774
-
775
- # Normalize selected weights
776
- weights = top_vals / top_vals.sum(dim=-1, keepdim=True)
777
-
778
- # Load balancing loss (prevents expert collapse)
779
- balance_loss = n_experts * (
780
- fraction_routed_to_each * average_gate_value_for_each
781
- ).sum()
782
-
783
- return weights, top_idx, balance_loss
784
  ```
785
 
786
- </details>
787
-
788
- <details>
789
- <summary><b>Click to expand: Parameter Breakdown</b></summary>
790
-
791
- | Component | Parameters | % of Total |
792
- |:--|:--|:--|
793
- | Token embeddings | 410M | 2.8% |
794
- | Attention (QKV + output) × 24 | 1,610M | 10.9% |
795
- | MoE experts (4 × SwiGLU × 24) | 12,365M | 83.5% |
796
- | Router weights × 24 | 0.4M | 0.003% |
797
- | RMSNorm × 49 | 0.4M | 0.003% |
798
- | Output head | 410M | 2.8% |
799
- | **Total** | **14,815M** | **100%** |
800
- | **Active per token (top-2)** | **~7,800M** | **~53%** |
801
-
802
- </details>
803
-
804
- ---
805
-
806
- ## 📋 Model Card Details
807
-
808
- | Field | Value |
809
- |:--|:--|
810
- | **Model Name** | SentinelBrain-14B-MoE-v0.1 (Sentinel Prime — Frankenstein Edition) |
811
- | **Type** | Causal Language Model (decoder-only) |
812
- | **Architecture** | Custom MoE Transformer (from scratch) |
813
- | **Based On** | Nothing — trained from random initialization |
814
- | **Training Hardware** | 1× AMD Instinct MI300X VF (192 GB HBM3) |
815
- | **Training Software** | ROCm 7.0, PyTorch 2.10.0+rocm7.0 |
816
- | **Training Duration** | ~300 GPU-hours (estimated total) |
817
- | **Carbon Footprint** | Estimated ~45 kg CO₂ (single GPU, cloud datacenter) |
818
- | **License** | Apache 2.0 |
819
- | **Authors** | Mircea Rusu, QubitDev |
820
- | **Competition** | AMD Developer Hackathon (lablab.ai) |
821
-
822
- ---
823
-
824
- ## 📄 Citation
825
-
826
- ```bibtex
827
- @misc{sentinelbrain2026,
828
- title = {SentinelBrain-14B-MoE (Frankenstein Edition): A Consciousness-Monitored Mixture-of-Experts
829
- Language Model Trained From Scratch on AMD MI300X},
830
- author = {Mircea Rusu and QubitDev},
831
- year = {2026},
832
- url = {https://sentinel.qubitpage.com/whitepaper},
833
- note = {Trained entirely from scratch on AMD Instinct MI300X
834
- for the AMD Developer Hackathon}
835
- }
836
- ```
837
-
838
- ---
839
-
840
- ## 🔗 Links
841
-
842
- | Resource | URL |
843
- |:--|:--|
844
- | 🔴 **Live Dashboard** | [sentinel.qubitpage.com](https://sentinel.qubitpage.com/) |
845
- | 📄 **Whitepaper** | [sentinel.qubitpage.com/whitepaper](https://sentinel.qubitpage.com/whitepaper) |
846
- | 🏆 **AMD Hackathon** | [lablab.ai](https://lablab.ai/ai-hackathons/amd-developer) |
847
- | 🧠 **Φ Monitor** | [sentinel.qubitpage.com/#phi](https://sentinel.qubitpage.com/#phi) |
848
-
849
- ---
850
-
851
- <div align="center">
852
-
853
- *Built with ❤️ on AMD MI300X — Every weight trained from scratch*
854
-
855
- **Sentinel Prime (Frankenstein Edition): Rebuilt From the Inside Out**
856
 
857
- </div>
 
1
  ---
2
  license: apache-2.0
 
 
 
 
3
  tags:
4
  - sentinelbrain
 
 
 
 
 
 
5
  - moe
6
+ - rocm
7
+ - mi300x
8
+ - pytorch
9
+ - checkpoint
10
+ - realignment
 
11
  library_name: pytorch
12
+ pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
+ # SentinelBrain 14B MoE v0.1 - Frankenstein Realignment v2
16
 
17
+ This repository now includes SentinelBrain Frankenstein realignment v2 artifacts from the AMD MI300X run completed on 2026-05-03.
18
 
19
+ ## v2 Training Update
20
 
21
+ - Architecture: custom SentinelBrain sparse MoE decoder, approximately 14.4B stored parameters, 4 experts, top-2 routing, 24 layers, d_model 4096, seq_len 4096.
22
+ - Hardware: AMD Instinct MI300X via ROCm/HIP.
23
+ - Run: Frankenstein realignment v2 from raw merged checkpoint.
24
+ - Completed steps: 5,000.
25
+ - Total training tokens during realignment: approximately 0.98B.
26
+ - Best validation loss observed: 5.1359.
27
+ - Final checkpoint: `checkpoints/frankenstein_v2_final.pt`.
28
+ - Best checkpoint: `checkpoints/frankenstein_v2_best.pt`.
29
+ - EMA best checkpoint: `checkpoints/frankenstein_v2_ema_best.pt`.
30
+ - Previous Hugging Face version preserved on branch: `previous-before-v2-realign-5000-20260503-103121`.
31
 
32
+ ## Included Files
 
33
 
34
+ - `checkpoints/frankenstein_v2_final.pt`: full final checkpoint at step 5000, including optimizer/progress state.
35
+ - `checkpoints/frankenstein_v2_best.pt`: best model-only checkpoint by validation loss.
36
+ - `checkpoints/frankenstein_v2_ema_best.pt`: EMA best checkpoint from the v2 run.
37
+ - `checkpoints/sentinelbrain_pretrain_step2471_hf.pt`: pretrain anchor used for comparison.
38
+ - `logs/realign_v2.log`: full realignment console log.
39
+ - `logs/realign_v2_metrics.jsonl`: step metrics emitted during training.
40
+ - `reports/train_metrics_final.json`: final dashboard training metrics snapshot.
41
+ - `reports/conductor_state_final.json`: final dashboard/conductor state.
42
+ - `reports/sft_combined_ready_report.*`: cleaned SFT dataset preflight report.
43
+ - `reports/sentinelbrain_quality_stub_full_fixed.json`: MI300X executable-code benchmark report.
 
 
 
 
 
 
 
 
 
44
 
45
+ A full progress archive containing all v2 milestones and optimizer-bearing checkpoints is backed up off-Hub on the Azure VM at `/home/msrusu/sentinelbrain_backups/v2_realign_5000/sentinelbrain_v2_realign_full_20260503.tar.zst`. A SHA256 sidecar is generated at archive completion.
 
 
 
 
 
46
 
47
+ ## Current Evaluation Notes
48
 
49
+ MI300X executable-code tests show that v2 realignment is not yet ready as a coding assistant checkpoint:
50
 
51
+ | Checkpoint | Pass@1 | Syntax Rate | Notes |
52
+ | --- | ---: | ---: | --- |
53
+ | `frankenstein_v2_best.pt` | 0.0% | 62.5% | Failed all 8 HumanEval-style stub tasks. |
54
+ | `frankenstein_v2_final.pt` | 0.0% | 75.0% | Failed all 8 HumanEval-style stub tasks. |
55
+ | `sentinelbrain_pretrain_step2471_hf.pt` | 0.0% | 87.5% | Failed all 8 tasks but produced the most syntactically valid Python. |
56
 
57
+ Interpretation: v2 successfully completed corpus realignment and preserved all progress artifacts, but it needs a focused next phase of executable code SFT, function-call/chat formatting, and auto-critic rejection sampling before quality claims should be made.
58
 
59
+ ## Dataset Preparation Status
60
 
61
+ The next SFT combined dataset was cleaned non-destructively on the MI300X server:
62
 
63
+ - Input rows: 42,138.
64
+ - Kept rows: 32,996 (78.3%).
65
+ - Removed rows: 9,142.
66
+ - Max estimated tokens: 3,072.
67
+ - Main removals: short assistant/user outputs, garbage responses, repetitive responses, and over-length samples.
 
68
 
69
+ ## Loading
70
 
71
+ These are custom SentinelBrain PyTorch checkpoints, not standard Hugging Face `AutoModelForCausalLM` weights. Load with the SentinelBrain code from `/workspace/sentinelprime` or the matching source package.
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  ```python
74
  import torch
75
+ from config import ModelConfig
76
+ from model.sentinel import SentinelBrain
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
+ ckpt = torch.load("checkpoints/frankenstein_v2_best.pt", map_location="cpu", weights_only=False)
79
+ model = SentinelBrain(ModelConfig())
80
+ state = ckpt.get("model_state_dict") or ckpt.get("model") or ckpt
81
+ model.load_state_dict(state, strict=False)
82
+ model.eval()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ```
84
 
85
+ ## Next Phase Direction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
+ The recommended next phase is a controlled SFT/auto-critic cycle: train from the pretrain anchor plus selected v2 weights only after passing format probes, prioritize executable Python/TypeScript/code-repair datasets, reject non-compiling generations, and benchmark every 250-500 steps before continuing.
SHA256SUMS ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 934951bc3c8e50c38c2a43812076d6174542066675667125aca6e2c854e9bdf3 ./MANIFEST.tsv
2
+ 1b145ec3a4ade152b78636057556a62db85b2729be482fbd3fb5ff47faa99ee2 ./README.md
3
+ 0069f86ff1273b268cbbf89a4c907d574c67ac0f145191aaa5d51a26f894f699 ./SHA256SUMS.quick
4
+ 075ac8f0bd257ef59af4ed7364f4331af6e46682f1cbefbc0d691073246af215 ./checkpoints/frankenstein_v2_best.pt
5
+ 2aeba733bc950e050a254b1942d592d306f45fecc44a9f407843249511c1e5c3 ./checkpoints/frankenstein_v2_ema_best.pt
6
+ cd9b31fa889479020d394cc16f215f80a4d390fe0b1ad3cdd697544181f4e639 ./checkpoints/frankenstein_v2_final.pt
7
+ b4ffaaf49e3c3b8fea2029ed5d0665bdc724aa8c61aca6f19abcd0331c96285b ./checkpoints/sentinelbrain_pretrain_step2471_hf.pt
8
+ 79af33d8bbe0e8cfa0393ca965feca10f1d2094ebbf93b9ba722114811925f36 ./logs/realign_v2.log
9
+ 8a14d6eb26eeaad4bab6619d86298cb41f29dd488a45d34e143247c2aecfce9c ./logs/realign_v2_metrics.jsonl
10
+ b1284891f8c1fb2ecfa2658a45b5aed8444572b1b43f32a9cb9811cc4f756732 ./reports/conductor_state_final.json
11
+ 10247cdf7a184c76a12dadc7cac857d6d18b2d5fb80db72e65a3fe188621fd0f ./reports/next_phase_training_plan.md
12
+ 0d73309ac23a68e8b9aa7014f7b9f4f1e4362a373d417be442d4726564a07231 ./reports/sentinelbrain_quality_smoke.json
13
+ 8ff4ce6ce7939b752be6e9d55ebc4bf75817945a3a136bfed44ec051bfcd90c4 ./reports/sentinelbrain_quality_stub_full_fixed.json
14
+ c5a81c2b86c6ba5ed54d3b520f0f79fae18d7588ab7b454df24884f3ca189e5d ./reports/sft_combined_ready_report.json
15
+ 0d76e478bbc036268d851d3de87edc4300a1291bec3b800bd4e92c57cff0564b ./reports/sft_combined_ready_report.md
16
+ a6fc72c251df4b39d6963a121d8478f759e3feb692aa5140fad71eba3d1308f6 ./reports/train_metrics_final.json
SHA256SUMS.quick ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ 1b145ec3a4ade152b78636057556a62db85b2729be482fbd3fb5ff47faa99ee2 README.md
2
+ 10247cdf7a184c76a12dadc7cac857d6d18b2d5fb80db72e65a3fe188621fd0f reports/next_phase_training_plan.md
3
+ 8ff4ce6ce7939b752be6e9d55ebc4bf75817945a3a136bfed44ec051bfcd90c4 reports/sentinelbrain_quality_stub_full_fixed.json
4
+ 934951bc3c8e50c38c2a43812076d6174542066675667125aca6e2c854e9bdf3 MANIFEST.tsv
checkpoints/frankenstein_v2_best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:075ac8f0bd257ef59af4ed7364f4331af6e46682f1cbefbc0d691073246af215
3
+ size 28808034875
checkpoints/frankenstein_v2_ema_best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2aeba733bc950e050a254b1942d592d306f45fecc44a9f407843249511c1e5c3
3
+ size 28808008759
checkpoints/frankenstein_v2_final.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd9b31fa889479020d394cc16f215f80a4d390fe0b1ad3cdd697544181f4e639
3
+ size 86422580021
checkpoints/sentinelbrain_pretrain_step2471_hf.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4ffaaf49e3c3b8fea2029ed5d0665bdc724aa8c61aca6f19abcd0331c96285b
3
+ size 29629503600
logs/realign_v2.log ADDED
The diff for this file is too large to render. See raw diff
 
logs/realign_v2_metrics.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
reports/conductor_state_final.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"current_phase": "realignment_v2_unfrozen", "batch_number": 1, "blocks_done": 0, "current_batch_steps": 5000, "current_started_ts": null, "current_batch_start": null, "current_model_params_b": 14.4, "current_d_model": 4096, "current_n_layers": 24, "current_n_experts": 4, "current_n_active": 2, "current_seq_len": 4096, "current_batch": 8, "current_grad_accum": 6, "best_val": 5.135942053794861, "best_ckpt": "/mnt/scratch/checkpoints/frankenstein_v2_best.pt", "total_tokens_trained": 981270528, "history": []}
reports/next_phase_training_plan.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SentinelBrain Next Training Phase Plan
2
+
3
+ Created: 2026-05-03
4
+
5
+ ## Current Result
6
+
7
+ The v2 realignment completed 5,000 steps and preserved full optimizer/progress state, but executable-code quality is not ready:
8
+
9
+ - `frankenstein_v2_best.pt`: 0/8 Pass@1, 62.5% syntax rate.
10
+ - `frankenstein_v2_final.pt`: 0/8 Pass@1, 75.0% syntax rate.
11
+ - `sentinelbrain_pretrain_step2471_hf.pt`: 0/8 Pass@1, 87.5% syntax rate.
12
+
13
+ The next phase should not be another broad corpus realignment. It should be a narrow, measurable SFT and auto-critic loop focused on producing valid, executable assistant outputs.
14
+
15
+ ## Phase 3 Objective
16
+
17
+ Recover and improve instruction-following/code-generation behavior while preserving the useful realignment progress.
18
+
19
+ Primary gates:
20
+
21
+ - Python stub benchmark: at least 40% Pass@1 and 95% syntax rate before extending past 1,000 steps.
22
+ - MBPP/HumanEval sample: measurable improvement every eval window, no syntax regression.
23
+ - Chat format probe: responses must use the requested format and stop cleanly.
24
+ - Safety/data probe: no leaked secrets, no private-key blocks, no repetitive boilerplate.
25
+
26
+ ## Data Mix
27
+
28
+ Use the cleaned SFT dataset at `/mnt/scratch/datasets/combined/sft_combined_ready.jsonl` as the base, then rebalance before training:
29
+
30
+ - 45% executable Python and TypeScript tasks: HumanEval-style stubs, MBPP-style prompts, unit-test repair, CLI scripts, API handlers.
31
+ - 20% code editing and diff output: unified diffs, bug fixes, refactors, failing-test-to-patch examples.
32
+ - 15% tool-use and agent workflows: file search, terminal commands, deployment diagnostics, function-call JSON.
33
+ - 10% system/admin/devops: Linux, Docker, nginx, pm2, Azure, SSH, logs.
34
+ - 10% general instruction/chat: concise natural language, summarization, planning.
35
+
36
+ Avoid over-weighting audio/prose/OCR rows in this phase. Keep those for a later multimodal run after chat/code behavior is stable.
37
+
38
+ ## Auto-Critic Pipeline
39
+
40
+ For each generated training candidate:
41
+
42
+ 1. Normalize to a strict prompt/response or ChatML schema.
43
+ 2. Run syntax checks for code outputs.
44
+ 3. Run unit tests when a test harness is available.
45
+ 4. Score format compliance: required function names, JSON validity, diff parseability, stop tokens.
46
+ 5. Reject outputs with repetition, generic filler, missing entry points, invalid tokens, or secret-like strings.
47
+ 6. Keep only examples that pass the critic or have a repair trajectory showing the failed output and corrected output.
48
+
49
+ Recommended critic labels:
50
+
51
+ - `syntax_pass`
52
+ - `tests_pass`
53
+ - `entrypoint_match`
54
+ - `format_pass`
55
+ - `no_secret`
56
+ - `no_repetition`
57
+ - `accepted`
58
+
59
+ Train on accepted final answers plus curated repair traces, not raw failed generations.
60
+
61
+ ## Training Schedule
62
+
63
+ Start from `sentinelbrain_pretrain_step2471_hf.pt` or `frankenstein_v2_best.pt` only after a short format probe. If `frankenstein_v2_best.pt` continues to emit invalid boilerplate, use the pretrain anchor and reintroduce v2 weights later via low-LR distillation.
64
+
65
+ Suggested run:
66
+
67
+ - Steps 0-250: frozen experts, train attention/router/norms and output behavior at low LR.
68
+ - Steps 250-1,000: unfreeze selected layers, SFT on executable/code-repair mix.
69
+ - Eval every 250 steps; save model-only checkpoints and full optimizer checkpoints at each gate.
70
+ - Stop automatically if syntax rate drops for two evals or Pass@1 does not improve after 750 steps.
71
+ - Continue to 3,000-5,000 only after hitting the 1,000-step gates.
72
+
73
+ ## Benchmark Set
74
+
75
+ Always run on the MI300X server:
76
+
77
+ - Local 8-task smoke benchmark for quick regression checks.
78
+ - HumanEval subset and MBPP subset with executable tests.
79
+ - JSON/function-call validity suite.
80
+ - Unified-diff parse/apply suite.
81
+ - DevOps shell-command reasoning suite.
82
+
83
+ Persist every benchmark JSON under `/mnt/scratch/benchmark_results/` and copy selected reports into the HF release folder.
84
+
85
+ ## Operational Rules
86
+
87
+ - Keep full optimizer checkpoints on server/Azure, not local PC.
88
+ - Store model-only best checkpoints separately for HF and chat loading.
89
+ - Never expose chat using a checkpoint that fails the format probe unless clearly labeled experimental.
90
+ - Use `SENTINEL_WEB_CHAT_DISABLED=0` only after confirming no training job is active and VRAM has enough headroom.
91
+ - Prefer `frankenstein_v2_best.pt` for validation-loss experiments, but prefer the pretrain anchor for code SFT if code probes remain better there.
reports/sentinelbrain_quality_smoke.json ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "created_at": "2026-05-03 10:09:39",
3
+ "device": "cuda",
4
+ "problem_count": 2,
5
+ "reference": "best",
6
+ "comparison": [
7
+ {
8
+ "label": "best",
9
+ "pass_at_1": 0.0,
10
+ "syntax_rate": 0.0,
11
+ "delta_vs_reference": 0.0
12
+ },
13
+ {
14
+ "label": "hf_pretrain_base",
15
+ "pass_at_1": 0.0,
16
+ "syntax_rate": 100.0,
17
+ "delta_vs_reference": 0.0
18
+ }
19
+ ],
20
+ "results": [
21
+ {
22
+ "path": "/mnt/scratch/checkpoints/frankenstein_v2_best.pt",
23
+ "label": "best",
24
+ "step": null,
25
+ "best_val_loss": null,
26
+ "val_loss": null,
27
+ "load_seconds": 131.73,
28
+ "missing_keys": 0,
29
+ "unexpected_keys": 0,
30
+ "config": {
31
+ "d_model": 4096,
32
+ "n_layers": 24,
33
+ "n_experts": 4,
34
+ "seq_len": 4096
35
+ },
36
+ "total": 2,
37
+ "passed": 0,
38
+ "failed": 2,
39
+ "timeouts": 0,
40
+ "syntax_ok": 0,
41
+ "pass_at_1": 0.0,
42
+ "syntax_rate": 0.0,
43
+ "seconds": 12.92,
44
+ "categories": {
45
+ "basics": {
46
+ "total": 1,
47
+ "passed": 0
48
+ },
49
+ "algorithm": {
50
+ "total": 1,
51
+ "passed": 0
52
+ }
53
+ },
54
+ "details": [
55
+ {
56
+ "id": "fizzbuzz",
57
+ "category": "basics",
58
+ "passed": false,
59
+ "syntax_ok": false,
60
+ "timeout": false,
61
+ "gen_seconds": 8.35,
62
+ "response_preview": "Input: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1",
63
+ "code_preview": "Input: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1",
64
+ "stderr": " File \"<string>\", line 1\n Input: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1\n ^^^\nSyntaxError: invalid syntax\n"
65
+ },
66
+ {
67
+ "id": "two_sum",
68
+ "category": "algorithm",
69
+ "passed": false,
70
+ "syntax_ok": false,
71
+ "timeout": false,
72
+ "gen_seconds": 4.55,
73
+ "response_preview": "- The code snippet is a string. The function should be a string of the string. The function should return the string.\n\n- The function should handle the string as input and returns the string.\n\n- The function should handle the string as input and returns the string.\n\n- The function should handle the string as input and returns the string.\n\n- The function should handle the string as input and returns the string.\n\n- The function should handle the string as input and returns the string.\n\n- The funct",
74
+ "code_preview": "- The code snippet is a string. The function should be a string of the string. The function should return the string.\n\n- The function should handle the string as input and returns the string.\n\n- The function should handle the string as input and returns the string.\n\n- The function should handle the string as input and returns the string.\n\n- The function should handle the string as input and returns the string.\n\n- The function should handle the string as input and returns the string.\n\n- The funct",
75
+ "stderr": " File \"<string>\", line 1\n - The code snippet is a string. The function should be a string of the string. The function should return the string.\n ^^^^\nSyntaxError: invalid syntax\n"
76
+ }
77
+ ]
78
+ },
79
+ {
80
+ "path": "/mnt/scratch/checkpoints/sentinelbrain_pretrain_step2471_hf.pt",
81
+ "label": "hf_pretrain_base",
82
+ "step": 2471,
83
+ "best_val_loss": null,
84
+ "val_loss": 1.9925728058815002,
85
+ "load_seconds": 127.86,
86
+ "missing_keys": 0,
87
+ "unexpected_keys": 0,
88
+ "config": {
89
+ "d_model": 4096,
90
+ "n_layers": 24,
91
+ "n_experts": 4,
92
+ "seq_len": 4096
93
+ },
94
+ "total": 2,
95
+ "passed": 0,
96
+ "failed": 2,
97
+ "timeouts": 0,
98
+ "syntax_ok": 2,
99
+ "pass_at_1": 0.0,
100
+ "syntax_rate": 100.0,
101
+ "seconds": 9.23,
102
+ "categories": {
103
+ "basics": {
104
+ "total": 1,
105
+ "passed": 0
106
+ },
107
+ "algorithm": {
108
+ "total": 1,
109
+ "passed": 0
110
+ }
111
+ },
112
+ "details": [
113
+ {
114
+ "id": "fizzbuzz",
115
+ "category": "basics",
116
+ "passed": false,
117
+ "syntax_ok": true,
118
+ "timeout": false,
119
+ "gen_seconds": 4.6,
120
+ "response_preview": "def prime_advanced_even(n):\n return n == 1\n\n# Test cases\nn = 10\nresult = prime_advanced_even(n)\nprint(result) # Output: ['Buzz', '3', '5', '7', '11', '13', '17', '19', '23', '25', '27', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31",
121
+ "code_preview": "def prime_advanced_even(n):\n return n == 1\n\n# Test cases\nn = 10\nresult = prime_advanced_even(n)\nprint(result) # Output: ['Buzz', '3', '5', '7', '11', '13', '17', '19', '23', '25', '27', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31', '31",
122
+ "stderr": "Traceback (most recent call last):\n File \"<string>\", line 10, in <module>\nNameError: name 'fizzbuzz' is not defined\n"
123
+ },
124
+ {
125
+ "id": "two_sum",
126
+ "category": "algorithm",
127
+ "passed": false,
128
+ "syntax_ok": true,
129
+ "timeout": false,
130
+ "gen_seconds": 4.61,
131
+ "response_preview": "```python\ndef add_two_numbers(nums, target):\n return [num for num in nums if num != target]\n```\n\nThe function `add_two_numbers(nums, target)` takes a list of integers `nums` and a target sum `target`, and returns two lists: one for the sum of the elements in `nums` and the target sum `target`, and the indices of the two numbers add up to `target`. The function is then called with these lists as arguments, and the result is printed.\n\nFor example, if the input list is `[1, 2, 3, 4, 5]`",
132
+ "code_preview": "def add_two_numbers(nums, target):\n return [num for num in nums if num != target]",
133
+ "stderr": "Traceback (most recent call last):\n File \"<string>\", line 5, in <module>\nNameError: name 'two_sum' is not defined\n"
134
+ }
135
+ ]
136
+ }
137
+ ]
138
+ }
reports/sentinelbrain_quality_stub_full_fixed.json ADDED
@@ -0,0 +1,448 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "created_at": "2026-05-03 10:35:36",
3
+ "device": "cuda",
4
+ "problem_count": 8,
5
+ "reference": "best",
6
+ "comparison": [
7
+ {
8
+ "label": "best",
9
+ "pass_at_1": 0.0,
10
+ "syntax_rate": 62.5,
11
+ "delta_vs_reference": 0.0
12
+ },
13
+ {
14
+ "label": "finished_final",
15
+ "pass_at_1": 0.0,
16
+ "syntax_rate": 75.0,
17
+ "delta_vs_reference": 0.0
18
+ },
19
+ {
20
+ "label": "hf_pretrain_base",
21
+ "pass_at_1": 0.0,
22
+ "syntax_rate": 87.5,
23
+ "delta_vs_reference": 0.0
24
+ }
25
+ ],
26
+ "results": [
27
+ {
28
+ "path": "/mnt/scratch/checkpoints/frankenstein_v2_best.pt",
29
+ "label": "best",
30
+ "step": null,
31
+ "best_val_loss": null,
32
+ "val_loss": null,
33
+ "load_seconds": 140.56,
34
+ "missing_keys": 0,
35
+ "unexpected_keys": 0,
36
+ "config": {
37
+ "d_model": 4096,
38
+ "n_layers": 24,
39
+ "n_experts": 4,
40
+ "seq_len": 4096
41
+ },
42
+ "total": 8,
43
+ "passed": 0,
44
+ "failed": 8,
45
+ "timeouts": 0,
46
+ "syntax_ok": 5,
47
+ "pass_at_1": 0.0,
48
+ "syntax_rate": 62.5,
49
+ "seconds": 49.93,
50
+ "categories": {
51
+ "basics": {
52
+ "total": 2,
53
+ "passed": 0
54
+ },
55
+ "algorithm": {
56
+ "total": 4,
57
+ "passed": 0
58
+ },
59
+ "practical": {
60
+ "total": 1,
61
+ "passed": 0
62
+ },
63
+ "data_structure": {
64
+ "total": 1,
65
+ "passed": 0
66
+ }
67
+ },
68
+ "details": [
69
+ {
70
+ "id": "fizzbuzz",
71
+ "category": "basics",
72
+ "passed": false,
73
+ "syntax_ok": true,
74
+ "timeout": false,
75
+ "gen_seconds": 9.43,
76
+ "response_preview": " return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(n",
77
+ "extracted_preview": "def get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n retur",
78
+ "code_preview": "def fizzbuzz(n):\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef get():\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prim",
79
+ "stderr": " File \"<string>\", line 2\n def get():\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
80
+ },
81
+ {
82
+ "id": "two_sum",
83
+ "category": "algorithm",
84
+ "passed": false,
85
+ "syntax_ok": true,
86
+ "timeout": false,
87
+ "gen_seconds": 5.76,
88
+ "response_preview": " input = list(map(int, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input",
89
+ "extracted_preview": "input = list(map(int, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, in",
90
+ "code_preview": "def two_sum(nums, target):\ninput = list(map(int, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, inp",
91
+ "stderr": " File \"<string>\", line 2\n input = list(map(int, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
92
+ },
93
+ {
94
+ "id": "valid_parens",
95
+ "category": "algorithm",
96
+ "passed": false,
97
+ "syntax_ok": true,
98
+ "timeout": false,
99
+ "gen_seconds": 5.77,
100
+ "response_preview": " if len(s) == 0:\n return False\n else:\n return False\n\n if len(v) == 0:\n return False\n else:\n return False\n\n if len(v) == 0:\n return False\n else:\n return False\n\n if len(v) == 0:\n return False\n else:\n return False\n\n if len(v) == 0:\n return False\n else:\n return False\n\n if len(v) == 0:\n return False\n else:\n return False\n\n if len(s) == 0:\n return False\n return False\n\ndef",
101
+ "extracted_preview": "def get_cipped():\n return False\n\n\ndef get_subget():\n return False\n\n\ndef get_subget():",
102
+ "code_preview": "def is_valid_parens(s):\ndef get_cipped():\n return False\n\n\ndef get_subget():\n return False\n\n\ndef get_subget():\n",
103
+ "stderr": " File \"<string>\", line 2\n def get_cipped():\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
104
+ },
105
+ {
106
+ "id": "binary_search",
107
+ "category": "algorithm",
108
+ "passed": false,
109
+ "syntax_ok": false,
110
+ "timeout": false,
111
+ "gen_seconds": 5.74,
112
+ "response_preview": " if not isinstance(value, list):\n return False\n else:\n return False\n\n if not isinstance(value, list):\n return False\n else:\n return None\n\n if not isinstance(value, list):\n return False\n else:\n return None\n\n if not isinstance(value, list):\n return False\n return None\n\ndef get_subget_with():\n return False\n\ndef get_subget():\n return False\n\n\ndef _get_all():\n return False\n\n\ndef _get_all():\n return _get_string(value)\n\nd",
113
+ "extracted_preview": "def get_subget_with():\n return False\n\ndef get_subget():\n return False\n\n\ndef _get_all():\n return False\n\n\ndef _get_all():\n return _get_string(value)\n\ndef _get_all():\n return _get('name', None)\n return None\n\ndef _get_all():\n return _get('name', None)\n return None\n\ndef _get_all():\n return _get('name",
114
+ "code_preview": "def binary_search(arr, target):\ndef get_subget_with():\n return False\n\ndef get_subget():\n return False\n\n\ndef _get_all():\n return False\n\n\ndef _get_all():\n return _get_string(value)\n\ndef _get_all():\n return _get('name', None)\n return None\n\ndef _get_all():\n return _get('name', None)\n return None\n\ndef _get_all():\n return _get('name\n",
115
+ "stderr": " File \"<string>\", line 25\n return _get('name\n ^\nSyntaxError: unterminated string literal (detected at line 25)\n"
116
+ },
117
+ {
118
+ "id": "reverse_words",
119
+ "category": "basics",
120
+ "passed": false,
121
+ "syntax_ok": true,
122
+ "timeout": false,
123
+ "gen_seconds": 5.74,
124
+ "response_preview": " if len(s) == 0:\n return False\n else:\n return False\n\n if len(s) == 0:\n return False\n else:\n return False\n\n if len(args) == 0:\n return False\n else:\n return False\n\n if len(args) == 0:\n return False\n else:\n return False\n\n if len(args) == 0:\n return False\n else:\n return False\n\n if not isinstance(value, bool):\n return False\n else:\n return False\n\n if not isinstance(value, bool):\n ",
125
+ "extracted_preview": "if len(s) == 0:\n return False\n else:\n return False\n\n if len(s) == 0:\n return False\n else:\n return False\n\n if len(args) == 0:\n return False\n else:\n return False\n\n if len(args) == 0:\n return False\n else:\n return False\n\n if len(args) == 0:\n return False\n else:\n return False\n\n if not isinstance(value, bool):\n return False\n else:\n return False\n\n if not isinstance(value, bool):\n ",
126
+ "code_preview": "def reverse_words(s):\nif len(s) == 0:\n return False\n else:\n return False\n\n if len(s) == 0:\n return False\n else:\n return False\n\n if len(args) == 0:\n return False\n else:\n return False\n\n if len(args) == 0:\n return False\n else:\n return False\n\n if len(args) == 0:\n return False\n else:\n return False\n\n if not isinstance(value, bool):\n return False\n else:\n return False\n\n if not isinstan",
127
+ "stderr": " File \"<string>\", line 2\n if len(s) == 0:\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
128
+ },
129
+ {
130
+ "id": "max_subarray",
131
+ "category": "algorithm",
132
+ "passed": false,
133
+ "syntax_ok": false,
134
+ "timeout": false,
135
+ "gen_seconds": 5.76,
136
+ "response_preview": " if not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(0) or str) or not isinstance(0) or None) or None.\n\nfor i in range(0) or not in any case, or any other or other.\n\n(0) or not, in any case, or any other or other.\n\n(0) or not, or any ",
137
+ "extracted_preview": "if not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(0) or str) or not isinstance(0) or None) or None.\n\nfor i in range(0) or not in any case, or any other or other.\n\n(0) or not, in any case, or any other or other.\n\n(0) or not, or any othe",
138
+ "code_preview": "def max_subarray_sum(nums):\nif not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(0) or str) or not isinstance(0) or None) or None.\n\nfor i in range(0) or not in any case, or any other or other.\n\n(0) or not, in any case, or any other or oth",
139
+ "stderr": " File \"<string>\", line 2\n if not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or int) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(value) or str) or not isinstance(0) or str) or not isinstance(0) or None) or None.\n ^\nSyntaxError: unmatched ')'\n"
140
+ },
141
+ {
142
+ "id": "safe_json_parse",
143
+ "category": "practical",
144
+ "passed": false,
145
+ "syntax_ok": false,
146
+ "timeout": false,
147
+ "gen_seconds": 5.8,
148
+ "response_preview": " \"\"\"Get the function to check if the input is None or None.\n\n Args:\n return None\n\n if not isinstance(obj, list):\n return False\n\n if not isinstance(value, list):\n return False\n\n if not isinstance(value, list):\n return False\n\n if not isinstance(value, list):\n return False\n\n if isinstance(value, list):\n return None\n\n return None\n\ndef get_value(value):\n return None\n\ndef _get_all():\n return None\n\ndef _get_all():\n return _get_s",
149
+ "extracted_preview": "def get_value(value):\n return None\n\ndef _get_all():\n return None\n\ndef _get_all():\n return _get_string(value)\n\ndef _get_all():\n return _get('name', None) for _ in _get('name', None) for _ in _get('name', None) if not in ('name', 'str', 'str",
150
+ "code_preview": "import json\n\ndef safe_json_parse(text):\ndef get_value(value):\n return None\n\ndef _get_all():\n return None\n\ndef _get_all():\n return _get_string(value)\n\ndef _get_all():\n return _get('name', None) for _ in _get('name', None) for _ in _get('name', None) if not in ('name', 'str', 'str\n",
151
+ "stderr": " File \"<string>\", line 14\n return _get('name', None) for _ in _get('name', None) for _ in _get('name', None) if not in ('name', 'str', 'str\n ^\nSyntaxError: unterminated string literal (detected at line 14)\n"
152
+ },
153
+ {
154
+ "id": "lru_cache",
155
+ "category": "data_structure",
156
+ "passed": false,
157
+ "syntax_ok": true,
158
+ "timeout": false,
159
+ "gen_seconds": 5.86,
160
+ "response_preview": " self._get = self._get_data(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ",
161
+ "extracted_preview": "self._get = self._get_data(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,",
162
+ "code_preview": "from collections import OrderedDict\n\nclass LRUCache:\n def __init__(self, capacity):\nself._get = self._get_data(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
163
+ "stderr": " File \"<string>\", line 5\n self._get = self._get_data(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\nIndentationError: expected an indented block after function definition on line 4\n"
164
+ }
165
+ ]
166
+ },
167
+ {
168
+ "path": "/mnt/scratch/checkpoints/frankenstein_v2_final.pt",
169
+ "label": "finished_final",
170
+ "step": 5000,
171
+ "best_val_loss": 5.135942053794861,
172
+ "val_loss": null,
173
+ "load_seconds": 169.75,
174
+ "missing_keys": 0,
175
+ "unexpected_keys": 0,
176
+ "config": {
177
+ "d_model": 4096,
178
+ "n_layers": 24,
179
+ "n_experts": 4,
180
+ "seq_len": 4096
181
+ },
182
+ "total": 8,
183
+ "passed": 0,
184
+ "failed": 8,
185
+ "timeouts": 0,
186
+ "syntax_ok": 6,
187
+ "pass_at_1": 0.0,
188
+ "syntax_rate": 75.0,
189
+ "seconds": 47.01,
190
+ "categories": {
191
+ "basics": {
192
+ "total": 2,
193
+ "passed": 0
194
+ },
195
+ "algorithm": {
196
+ "total": 4,
197
+ "passed": 0
198
+ },
199
+ "practical": {
200
+ "total": 1,
201
+ "passed": 0
202
+ },
203
+ "data_structure": {
204
+ "total": 1,
205
+ "passed": 0
206
+ }
207
+ },
208
+ "details": [
209
+ {
210
+ "id": "fizzbuzz",
211
+ "category": "basics",
212
+ "passed": false,
213
+ "syntax_ok": false,
214
+ "timeout": false,
215
+ "gen_seconds": 5.88,
216
+ "response_preview": " return 1\n\ndef get_s1, n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is",
217
+ "extracted_preview": "def get_s1, n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime",
218
+ "code_preview": "def fizzbuzz(n):\ndef get_s1, n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(n):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef is_prime(num):\n return 1\n\ndef",
219
+ "stderr": " File \"<string>\", line 2\n def get_s1, n):\n ^\nSyntaxError: unmatched ')'\n"
220
+ },
221
+ {
222
+ "id": "two_sum",
223
+ "category": "algorithm",
224
+ "passed": false,
225
+ "syntax_ok": true,
226
+ "timeout": false,
227
+ "gen_seconds": 5.87,
228
+ "response_preview": " input = list(map(int, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input",
229
+ "extracted_preview": "input = list(map(int, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, in",
230
+ "code_preview": "def two_sum(nums, target):\ninput = list(map(int, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, inp",
231
+ "stderr": " File \"<string>\", line 2\n input = list(map(int, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input, input\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
232
+ },
233
+ {
234
+ "id": "valid_parens",
235
+ "category": "algorithm",
236
+ "passed": false,
237
+ "syntax_ok": true,
238
+ "timeout": false,
239
+ "gen_seconds": 5.88,
240
+ "response_preview": " if len(v) == 0:\n return False\n else:\n return False\n\n if not isinstance(obj, bool):\n return False\n else:\n return False\n\n if not isinstance(obj, bool):\n return False\n else:\n return False\n\n if not isinstance(obj, bool):\n return False\n else:\n return False\n\n if not isinstance(value, bool):\n return False\n else:\n return None\n\n if not isinstance(value, bool):\n return False\n else:\n ret",
241
+ "extracted_preview": "def _get_all():\n return False\n\n\ndef _get",
242
+ "code_preview": "def is_valid_parens(s):\ndef _get_all():\n return False\n\n\ndef _get\n",
243
+ "stderr": " File \"<string>\", line 2\n def _get_all():\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
244
+ },
245
+ {
246
+ "id": "binary_search",
247
+ "category": "algorithm",
248
+ "passed": false,
249
+ "syntax_ok": true,
250
+ "timeout": false,
251
+ "gen_seconds": 5.83,
252
+ "response_preview": " if not isinstance(value, list):\n return False\n else:\n return False\n\n if not isinstance(value, list):\n return False\n else:\n return None\n\n if not isinstance(value, list):\n return False\n return False\n\ndef is_valid():\n return False\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef ",
253
+ "extracted_preview": "def is_valid():\n return False\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return None\n\ndef is_valid():\n return None\n\ndef is_valid():\n return None\n\ndef _get_all():\n return None\n\ndef _get_all():\n return _get_string(value)\n\ndef _get_all():",
254
+ "code_preview": "def binary_search(arr, target):\ndef is_valid():\n return False\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return False\n\n\ndef is_valid():\n return None\n\ndef is_valid():\n return None\n\ndef is_valid():\n return None\n\ndef _get_all():\n return None\n\ndef _get_all():\n return _get_string(value)\n\ndef _get_all():\n",
255
+ "stderr": " File \"<string>\", line 2\n def is_valid():\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
256
+ },
257
+ {
258
+ "id": "reverse_words",
259
+ "category": "basics",
260
+ "passed": false,
261
+ "syntax_ok": true,
262
+ "timeout": false,
263
+ "gen_seconds": 5.85,
264
+ "response_preview": " if len(s) == 0:\n return False\n else:\n return False\n\n if not isinstance(value, int):\n return False\n else:\n return False\n\n if not isinstance(value, int):\n return False\n else:\n return False\n\n if not isinstance(value, bool):\n return False\n else:\n return False\n\n if not isinstance(value, bool):\n return False\n else:\n return None\n\n if not isinstance(value, bool):\n return False\n else:\n ",
265
+ "extracted_preview": "def _get_all():\n return False\n\n\ndef _get_all():\n return _get_string(value)\n\ndef _get_list():\n return _",
266
+ "code_preview": "def reverse_words(s):\ndef _get_all():\n return False\n\n\ndef _get_all():\n return _get_string(value)\n\ndef _get_list():\n return _\n",
267
+ "stderr": " File \"<string>\", line 2\n def _get_all():\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
268
+ },
269
+ {
270
+ "id": "max_subarray",
271
+ "category": "algorithm",
272
+ "passed": false,
273
+ "syntax_ok": false,
274
+ "timeout": false,
275
+ "gen_seconds": 5.84,
276
+ "response_preview": " if not isinstance(value, int):\n return False\n return False\n\ndef get_subget_substring, start, value):\n return False\n return False\n\ndef get_subget_all_by_name, key):\n return False\n\ndef get_subget_all_by_name, name, name, name, name=None):\n return None\n\ndef get_name(name, name, name, name, name, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name",
277
+ "extracted_preview": "def get_subget_substring, start, value):\n return False\n return False\n\ndef get_subget_all_by_name, key):\n return False\n\ndef get_subget_all_by_name, name, name, name, name=None):\n return None\n\ndef get_name(name, name, name, name, name, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, n",
278
+ "code_preview": "def max_subarray_sum(nums):\ndef get_subget_substring, start, value):\n return False\n return False\n\ndef get_subget_all_by_name, key):\n return False\n\ndef get_subget_all_by_name, name, name, name, name=None):\n return None\n\ndef get_name(name, name, name, name, name, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=None, name=N",
279
+ "stderr": " File \"<string>\", line 2\n def get_subget_substring, start, value):\n ^\nSyntaxError: unmatched ')'\n"
280
+ },
281
+ {
282
+ "id": "safe_json_parse",
283
+ "category": "practical",
284
+ "passed": false,
285
+ "syntax_ok": true,
286
+ "timeout": false,
287
+ "gen_seconds": 5.89,
288
+ "response_preview": " \"\"\"Get the function to check if the input is not None or None.\n\n Returns:\n return None\n\n if not isinstance(value, list):\n return None\n\n if not isinstance(value, list):\n return None\n\n if isinstance(value, list):\n return None\n\n if isinstance(value, list):\n return None\n\n if isinstance(value, list):\n return None\n\n if isinstance(value, list):\n return None\n\n if isinstance(value, list):\n return None\n\n if isinstance(v",
289
+ "extracted_preview": "def _get_list(value, value):\n return None\n\ndef _get_list(value, value):\n return None\n\ndef _get_list(value, value):\n return None\n\ndef _get_list(value, value",
290
+ "code_preview": "import json\n\ndef safe_json_parse(text):\ndef _get_list(value, value):\n return None\n\ndef _get_list(value, value):\n return None\n\ndef _get_list(value, value):\n return None\n\ndef _get_list(value, value\n",
291
+ "stderr": " File \"<string>\", line 4\n def _get_list(value, value):\n ^\nIndentationError: expected an indented block after function definition on line 3\n"
292
+ },
293
+ {
294
+ "id": "lru_cache",
295
+ "category": "data_structure",
296
+ "passed": false,
297
+ "syntax_ok": true,
298
+ "timeout": false,
299
+ "gen_seconds": 5.9,
300
+ "response_preview": " self._test = self._get_data(0, self._get_data(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,",
301
+ "extracted_preview": "self._test = self._get_data(0, self._get_data(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,",
302
+ "code_preview": "from collections import OrderedDict\n\nclass LRUCache:\n def __init__(self, capacity):\nself._test = self._get_data(0, self._get_data(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
303
+ "stderr": " File \"<string>\", line 5\n self._test = self._get_data(0, self._get_data(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\nIndentationError: expected an indented block after function definition on line 4\n"
304
+ }
305
+ ]
306
+ },
307
+ {
308
+ "path": "/mnt/scratch/checkpoints/sentinelbrain_pretrain_step2471_hf.pt",
309
+ "label": "hf_pretrain_base",
310
+ "step": 2471,
311
+ "best_val_loss": null,
312
+ "val_loss": 1.9925728058815002,
313
+ "load_seconds": 138.71,
314
+ "missing_keys": 0,
315
+ "unexpected_keys": 0,
316
+ "config": {
317
+ "d_model": 4096,
318
+ "n_layers": 24,
319
+ "n_experts": 4,
320
+ "seq_len": 4096
321
+ },
322
+ "total": 8,
323
+ "passed": 0,
324
+ "failed": 8,
325
+ "timeouts": 0,
326
+ "syntax_ok": 7,
327
+ "pass_at_1": 0.0,
328
+ "syntax_rate": 87.5,
329
+ "seconds": 46.62,
330
+ "categories": {
331
+ "basics": {
332
+ "total": 2,
333
+ "passed": 0
334
+ },
335
+ "algorithm": {
336
+ "total": 4,
337
+ "passed": 0
338
+ },
339
+ "practical": {
340
+ "total": 1,
341
+ "passed": 0
342
+ },
343
+ "data_structure": {
344
+ "total": 1,
345
+ "passed": 0
346
+ }
347
+ },
348
+ "details": [
349
+ {
350
+ "id": "fizzbuzz",
351
+ "category": "basics",
352
+ "passed": false,
353
+ "syntax_ok": true,
354
+ "timeout": false,
355
+ "gen_seconds": 5.88,
356
+ "response_preview": " result = []\n for i in range(n):\n if i % 3 == 0:\n result.append(i)\n return result\n\nn = 3\nresult = fizzbuzz(n)\nprint(result)\n```\n\nThis code defines a function called `fizzbuzz(n)` that takes a number `n` as an argument. This function is used to check if a number is prime. The function is then used to check if the number is divisible by any number from 2 to n. The function is then used to filter out only numbers from 1 to n. The main function is then called with these in",
357
+ "extracted_preview": "result = []\n for i in range(n):\n if i % 3 == 0:\n result.append(i)\n return result\n\nn = 3\nresult = fizzbuzz(n)\nprint(result)\n```\n\nThis code defines a function called `fizzbuzz(n)` that takes a number `n` as an argument. This function is used to check if a number is prime. The function is then used to check if the number is divisible by any number from 2 to n. The function is then used to filter out only numbers from 1 to n. The main function is then called with these inputs",
358
+ "code_preview": "def fizzbuzz(n):\nresult = []\n for i in range(n):\n if i % 3 == 0:\n result.append(i)\n return result\n\nn = 3\nresult = fizzbuzz(n)\nprint(result)\n```\n\nThis code defines a function called `fizzbuzz(n)` that takes a number `n` as an argument. This function is used to check if a number is prime. The function is then used to check if the number is divisible by any number from 2 to n. The function is then used to filter out only numbers from 1 to n. The main function is then called ",
359
+ "stderr": " File \"<string>\", line 2\n result = []\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
360
+ },
361
+ {
362
+ "id": "two_sum",
363
+ "category": "algorithm",
364
+ "passed": false,
365
+ "syntax_ok": true,
366
+ "timeout": false,
367
+ "gen_seconds": 5.85,
368
+ "response_preview": " # Implement your solution here\n pass\n\ndef find_two_sum(nums, target):\n # Implement your solution here\n pass\n\n# Example usage\nnums = [1, 2, 3, 4, 5]\ntarget = 2\nresult = find_two_sum(nums, target)\nprint(result) # Output: [1, 2, 3, 4, 5]\n```<|im_end|>\n<|im_start|>assistant\nHere is the implementation of the requested function:\n\n```python\ndef two_sum(nums, target):\n def find_two_sum_helper(nums, target):\n if len(nums) == 0:\n return [nums[0], nums[1], nums[0]]\n ",
369
+ "extracted_preview": "<|im_end|>\n<|im_start|>assistant\nHere is the implementation of the requested function:",
370
+ "code_preview": "def two_sum(nums, target):\n<|im_end|>\n<|im_start|>assistant\nHere is the implementation of the requested function:\n",
371
+ "stderr": " File \"<string>\", line 2\n <|im_end|>\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
372
+ },
373
+ {
374
+ "id": "valid_parens",
375
+ "category": "algorithm",
376
+ "passed": false,
377
+ "syntax_ok": true,
378
+ "timeout": false,
379
+ "gen_seconds": 5.86,
380
+ "response_preview": " stack = []\n for char in s:\n if char == '(':\n stack.append(char)\n elif char == ')':\n if not stack:\n return False\n stack.pop()\n elif char == ')':\n if not stack:\n return False\n stack.pop()\n elif char == ')':\n if not stack:\n return False\n stack.pop()\n elif char == ')':\n if not stack:\n return False\n sta",
381
+ "extracted_preview": "stack = []\n for char in s:\n if char == '(':\n stack.append(char)\n elif char == ')':\n if not stack:\n return False\n stack.pop()\n elif char == ')':\n if not stack:\n return False\n stack.pop()\n elif char == ')':\n if not stack:\n return False\n stack.pop()\n elif char == ')':\n if not stack:\n return False\n stack.p",
382
+ "code_preview": "def is_valid_parens(s):\nstack = []\n for char in s:\n if char == '(':\n stack.append(char)\n elif char == ')':\n if not stack:\n return False\n stack.pop()\n elif char == ')':\n if not stack:\n return False\n stack.pop()\n elif char == ')':\n if not stack:\n return False\n stack.pop()\n elif char == ')':\n if not stack:\n return F",
383
+ "stderr": " File \"<string>\", line 2\n stack = []\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
384
+ },
385
+ {
386
+ "id": "binary_search",
387
+ "category": "algorithm",
388
+ "passed": false,
389
+ "syntax_ok": true,
390
+ "timeout": false,
391
+ "gen_seconds": 5.8,
392
+ "response_preview": " left = 0\n right = len(arr) - 1\n while left <= right:\n mid = left + (right - left) // 2\n if arr[mid] == target:\n return mid\n elif arr[mid] < target:\n left = mid + 1\n else:\n right = mid - 1\n return -1\n\ndef binary_search_recursive(arr, target):\n return binary_search_recursive(arr, target, 0)\n```\n\nThe binary_search function is a recursive function that performs a binary search on a target value in a sorted list. It works by",
393
+ "extracted_preview": "def binary_search_recursive(arr, target):\n return binary_search_recursive(arr, target, 0)\n```",
394
+ "code_preview": "def binary_search(arr, target):\ndef binary_search_recursive(arr, target):\n return binary_search_recursive(arr, target, 0)\n```\n",
395
+ "stderr": " File \"<string>\", line 2\n def binary_search_recursive(arr, target):\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
396
+ },
397
+ {
398
+ "id": "reverse_words",
399
+ "category": "basics",
400
+ "passed": false,
401
+ "syntax_ok": true,
402
+ "timeout": false,
403
+ "gen_seconds": 5.78,
404
+ "response_preview": " reversed_words = s.split(' ')\n reversed_words = [word[::-1] for word in reversed_words]\n return reversed_words\n```\n\nExample usage:\n\n```python\nprint(reverse_words(\"Hello, World! 123! 123! 123! 123! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123! 456! 123",
405
+ "extracted_preview": "Example usage:",
406
+ "code_preview": "def reverse_words(s):\nExample usage:\n",
407
+ "stderr": " File \"<string>\", line 2\n Example usage:\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
408
+ },
409
+ {
410
+ "id": "max_subarray",
411
+ "category": "algorithm",
412
+ "passed": false,
413
+ "syntax_ok": true,
414
+ "timeout": false,
415
+ "gen_seconds": 5.83,
416
+ "response_preview": " n = len(nums)\n max_sum = 0\n for i in range(n):\n current_sum = nums[i]\n max_sum = max(max_sum, current_sum)\n return max_sum\n\nnums = [1, 2, 3, 4, 5]\nmax_sum = max_subarray_sum(nums)\nprint(max_sum)\n```\n\nThis code defines a function `max_subarray_sum` that takes a list of integers as input. It initializes `max_sum` to the first element of the list and `max_sum` to 0. Then, it iterates through the list starting from the second element. For each element, it calculates the ma",
417
+ "extracted_preview": "n = len(nums)\n max_sum = 0\n for i in range(n):\n current_sum = nums[i]\n max_sum = max(max_sum, current_sum)\n return max_sum\n\nnums = [1, 2, 3, 4, 5]\nmax_sum = max_subarray_sum(nums)\nprint(max_sum)\n```\n\nThis code defines a function `max_subarray_sum` that takes a list of integers as input. It initializes `max_sum` to the first element of the list and `max_sum` to 0. Then, it iterates through the list starting from the second element. For each element, it calculates the maximu",
418
+ "code_preview": "def max_subarray_sum(nums):\nn = len(nums)\n max_sum = 0\n for i in range(n):\n current_sum = nums[i]\n max_sum = max(max_sum, current_sum)\n return max_sum\n\nnums = [1, 2, 3, 4, 5]\nmax_sum = max_subarray_sum(nums)\nprint(max_sum)\n```\n\nThis code defines a function `max_subarray_sum` that takes a list of integers as input. It initializes `max_sum` to the first element of the list and `max_sum` to 0. Then, it iterates through the list starting from the second element. For each eleme",
419
+ "stderr": " File \"<string>\", line 2\n n = len(nums)\n ^\nIndentationError: expected an indented block after function definition on line 1\n"
420
+ },
421
+ {
422
+ "id": "safe_json_parse",
423
+ "category": "practical",
424
+ "passed": false,
425
+ "syntax_ok": false,
426
+ "timeout": false,
427
+ "gen_seconds": 5.79,
428
+ "response_preview": " try:\n return json.loads(text, indent=2)\n except json.JSONDecodeError as e:\n return f\"Invalid JSON: {e}\"\n\n# Test the function\njson_str = '{\"name\": \"John\", \"age\":30}'\ntry:\n json_str = json.loads(json_str)\n print(json_str)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n",
429
+ "extracted_preview": "# Test the function\njson_str = '{\"name\": \"John\", \"age\":30}'\ntry:\n json_str = json.loads(json_str)\n print(json_str)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error",
430
+ "code_preview": "import json\n\ndef safe_json_parse(text):\n# Test the function\njson_str = '{\"name\": \"John\", \"age\":30}'\ntry:\n json_str = json.loads(json_str)\n print(json_str)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception as e:\n print(\"Error:\", e)\nexcept Exception ",
431
+ "stderr": " File \"<string>\", line 24\n print(\"Error\n ^\nSyntaxError: unterminated string literal (detected at line 24)\n"
432
+ },
433
+ {
434
+ "id": "lru_cache",
435
+ "category": "data_structure",
436
+ "passed": false,
437
+ "syntax_ok": true,
438
+ "timeout": false,
439
+ "gen_seconds": 5.78,
440
+ "response_preview": " self.capacity = capacity\n self.max_key = 0\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n",
441
+ "extracted_preview": "self.capacity = capacity\n self.max_key = 0\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n ",
442
+ "code_preview": "from collections import OrderedDict\n\nclass LRUCache:\n def __init__(self, capacity):\nself.capacity = capacity\n self.max_key = 0\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1\n self.max_key = -1",
443
+ "stderr": " File \"<string>\", line 5\n self.capacity = capacity\nIndentationError: expected an indented block after function definition on line 4\n"
444
+ }
445
+ ]
446
+ }
447
+ ]
448
+ }
reports/sft_combined_ready_report.json ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "created_at": "2026-05-03 10:06:19",
3
+ "inputs": [
4
+ "/mnt/scratch/datasets/combined/sft_combined.jsonl"
5
+ ],
6
+ "output": "/mnt/scratch/datasets/combined/sft_combined_ready.jsonl",
7
+ "settings": {
8
+ "max_chars": 12000,
9
+ "max_est_tokens": 3072,
10
+ "limit": 0
11
+ },
12
+ "seconds": 7.25,
13
+ "stats": {
14
+ "seen": 42138,
15
+ "assistant_too_short": 8451,
16
+ "kept": 32996,
17
+ "user_too_short": 502,
18
+ "repetitive_response": 85,
19
+ "too_many_chars": 2,
20
+ "binary_or_control_text": 1,
21
+ "garbage_response": 100,
22
+ "too_many_tokens": 1
23
+ },
24
+ "schemas": {
25
+ "instruction_output": 42138
26
+ },
27
+ "categories": {
28
+ "librispeech": 7785,
29
+ "wizardlm": 7506,
30
+ "dolly": 6539,
31
+ "alpaca": 6325,
32
+ "textcaps": 4492,
33
+ "squad_doc": 349
34
+ },
35
+ "token_buckets": {
36
+ "0001-0512": 28102,
37
+ "0513-1024": 4034,
38
+ "1025-1536": 712,
39
+ "1537-2048": 112,
40
+ "2049-3072": 36
41
+ },
42
+ "files": [
43
+ {
44
+ "path": "/mnt/scratch/datasets/combined/sft_combined.jsonl",
45
+ "seconds": 7.25,
46
+ "stats": {
47
+ "seen": 42138,
48
+ "assistant_too_short": 8451,
49
+ "kept": 32996,
50
+ "user_too_short": 502,
51
+ "repetitive_response": 85,
52
+ "too_many_chars": 2,
53
+ "binary_or_control_text": 1,
54
+ "garbage_response": 100,
55
+ "too_many_tokens": 1
56
+ }
57
+ }
58
+ ],
59
+ "preview": [
60
+ {
61
+ "line": 57,
62
+ "schema": "instruction_output",
63
+ "category": "squad_doc",
64
+ "est_tokens": 494,
65
+ "user_preview": "Read the following document excerpt and answer the question.\n\nDocument:\nChopin seldom performed publicly in Paris. In later years he generally gave a single ann",
66
+ "assistant_preview": "his own Paris apartment for small groups of friends"
67
+ },
68
+ {
69
+ "line": 64,
70
+ "schema": "instruction_output",
71
+ "category": "squad_doc",
72
+ "est_tokens": 217,
73
+ "user_preview": "Read the following document excerpt and answer the question.\n\nDocument:\nAt the funeral of the tenor Adolphe Nourrit in Paris in 1839, Chopin made a rare appeara",
74
+ "assistant_preview": "Berlioz's Grande symphonie funèbre et triomphale"
75
+ },
76
+ {
77
+ "line": 141,
78
+ "schema": "instruction_output",
79
+ "category": "squad_doc",
80
+ "est_tokens": 402,
81
+ "user_preview": "Read the following document excerpt and answer the question.\n\nDocument:\nLaird writes that Altan Khan abolished the native Mongol practices of shamanism and bloo",
82
+ "assistant_preview": "the native Mongol practices of shamanism and blood sacrifice"
83
+ },
84
+ {
85
+ "line": 158,
86
+ "schema": "instruction_output",
87
+ "category": "squad_doc",
88
+ "est_tokens": 232,
89
+ "user_preview": "Read the following document excerpt and answer the question.\n\nDocument:\nSome independent stereo manufacturers including JVC, Pioneer, Kenwood, Alpine, Sony, and",
90
+ "assistant_preview": "JVC, Pioneer, Kenwood, Alpine, Sony, and Harman Kardon"
91
+ },
92
+ {
93
+ "line": 165,
94
+ "schema": "instruction_output",
95
+ "category": "squad_doc",
96
+ "est_tokens": 197,
97
+ "user_preview": "Read the following document excerpt and answer the question.\n\nDocument:\nWith third parties like Namco, Square Enix, Electronic Arts, Sega, and Hudson Soft all m",
98
+ "assistant_preview": "Namco, Square Enix, Electronic Arts, Sega, and Hudson Soft"
99
+ }
100
+ ]
101
+ }
reports/sft_combined_ready_report.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Sentinel Coder Dataset Preflight
2
+
3
+ Created: 2026-05-03 10:06:19
4
+ Output: `/mnt/scratch/datasets/combined/sft_combined_ready.jsonl`
5
+ Runtime: 7.25s
6
+
7
+ ## Summary
8
+
9
+ - Seen rows: 42,138
10
+ - Kept rows: 32,996 (78.3%)
11
+ - Removed rows: 9,142
12
+ - Max estimated tokens: 3,072
13
+ - Max chars: 12,000
14
+
15
+ ## Removal Reasons
16
+
17
+ - assistant_too_short: 8,451
18
+ - user_too_short: 502
19
+ - garbage_response: 100
20
+ - repetitive_response: 85
21
+ - too_many_chars: 2
22
+ - binary_or_control_text: 1
23
+ - too_many_tokens: 1
24
+
25
+ ## Token Buckets
26
+
27
+ - 0001-0512: 28,102
28
+ - 0513-1024: 4,034
29
+ - 1025-1536: 712
30
+ - 1537-2048: 112
31
+ - 2049-3072: 36
32
+
33
+ ## Top Categories
34
+
35
+ - librispeech: 7,785
36
+ - wizardlm: 7,506
37
+ - dolly: 6,539
38
+ - alpaca: 6,325
39
+ - textcaps: 4,492
40
+ - squad_doc: 349
41
+
42
+ ## Source Schemas
43
+
44
+ - instruction_output: 42,138
reports/train_metrics_final.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"step": 5000, "train_loss": 0.0, "val_loss": 5.1359, "lr": 0.0, "tok_per_sec": 0, "eta_hrs": 0.0, "ts": 1777801509.3361254, "gpu": {"used_gb": 118.63, "peak_gb": 159.55, "total_gb": 205.8, "pct": 57.6, "sysfs_used_gb": 205.3, "sysfs_total_gb": 205.8, "sysfs_pct": 99.7}, "phase": "realignment_v2", "status": "done", "total_time_h": 50.28}