OmniCoder-9B-Zero-Phase2
CARL Phase 1' — VLM grounding checkpoint. EVAL: PASS (94.6% click accuracy).
A LoRA adapter trained with vision GRPO for GUI grounding. The model understands screenshots and produces structured coordinate output for click targets.
Results
| Metric | Value |
|---|---|
| Click accuracy | 94.61% |
| Format compliance | 100% |
| Eval samples | 167 held-out |
| Status | PASS |
Training
- Method: Vision GRPO with CARL cascade rewards
- Base model: Tesslate/OmniCoder-9B (Qwen3.5-9B VLM)
- SFT substrate: wheattoast11/OmniCoder-9B-Zero-Phase2-Vision-SFT
- Steps: 500 GRPO steps
- Hardware: 1x L40S 48GB, bf16, LoRA r=64
- Dataset: wheattoast11/grounding-with-images (20K samples)
Phase Transition Observed
During SFT, the model exhibited a first-order phase transition:
- Steps 0-10: Baseline (3% accuracy, entropy 1.0)
- Steps 10-20: Melting (entropy spikes to 9.3)
- Steps 20-25: Transition (accuracy jumps 57 points in 5 steps)
- Steps 25-35: Crystallization (99% accuracy, entropy 0.4)
- Steps 35-46: Converged (99.3%, entropy 0.12)
Consistent with Kuramoto synchronization in coupled oscillator systems.
Theoretical Foundation
- Bounded Informational Time Crystals — DOI: 10.5281/zenodo.18906944
- Material Reality — DOI: 10.5281/zenodo.18992029
- Semantic Realizability — DOI: 10.5281/zenodo.18992031
Usage
from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel
base = AutoModelForImageTextToText.from_pretrained(
"Tesslate/OmniCoder-9B",
torch_dtype="bfloat16",
device_map="cuda:0",
)
model = PeftModel.from_pretrained(base, "wheattoast11/OmniCoder-9B-Zero-Phase2")
model = model.merge_and_unload()
processor = AutoProcessor.from_pretrained(
"Tesslate/OmniCoder-9B",
min_pixels=256*28*28,
max_pixels=1280*28*28,
)
Citation
@article{desai2026carl,
title = {Coherence-Aware Reinforcement Learning},
author = {Desai, Tej},
year = {2026},
url = {https://github.com/wheattoast11/carl},
note = {Intuition Labs LLC}
}
License
Apache 2.0 — Intuition Labs LLC
- Downloads last month
- 712