OriOn-Qwen-SR1 Banner

OriOn-Qwen Synthetic Reasoning 1

SOTA on MMLongBenchDoc (58.3), surpassing a 7x larger model. This checkpoint extends OriOn-Qwen with synthetic reasoning traces that are internalized via low-strength model merging, achieving frontier long-document QA performance with no increase in inference cost.

TL;DR

We introduce a synthetic reasoning pipeline for long-document VQA: score every page for question relevance, extract evidence, keep the top-K pages sorted by relevance, and use this as a structured <think> trace during SFT. Low-strength model merging (α=0.25) then internalizes the reasoning: the model does not generate explicit thinking tokens, yet retains the full performance benefit. A <cot> control token gates the capability at inference time. The result is a 32B model that beats Qwen3-VL-235B-A22B-Instruct on MMLongBenchDoc while producing only ~250 mean output tokens.


Highlights

  • SOTA on MMLongBenchDoc with 58.3 accuracy, surpassing Qwen3-VL-235B-A22B-Instruct (57.0) and Thinking (56.2) with 7x fewer parameters
  • Internalized reasoning via low-strength model merging: no <think> tokens emitted, yet full performance retained
  • Controllable: place <cot> in the system prompt to activate reasoning (+3.8 MMLBD when on vs. off)
  • Drop-in replacement for Qwen/Qwen3-VL-32B-Instruct: same Qwen3VLForConditionalGeneration + AutoProcessor API

How It Works

This checkpoint builds on OriOn and extends it with synthetic reasoning traces (paper).

Synthetic reasoning pipeline

Given a document of N pages and a question Q:

  1. Evidence extraction & scoring: an extractor VLM (Qwen3-VL-32B-Instruct) processes each page independently, producing a relevance score ([0, 10]) and a natural-language evidence snippet.
  2. Top-K selection: pages below threshold are dropped, the top-K (default 24) are kept and sorted by relevance.
  3. Answer generation through two parallel branches: a visual branch (teacher VLM receives top-ranked page images) and a text branch (teacher LLM receives only the extracted evidence). Training examples are drawn equally from both.

The relevance-sorted evidence is placed inside <think> tags, gated by a <cot> control token (present in 95% of training examples).

Internalization via model merging

The final checkpoint is produced by task arithmetic: θ_merged = θ_base + α · (θ_SFT − θ_base). At α=0.25, the model does not emit thinking tokens and its mean output length is comparable to a non-reasoning baseline, yet it retains the full performance gains. Increasing α to 0.5 shifts the model to explicit reasoning with 12.4x more output tokens.

Why trace design matters

An earlier v1 pipeline visited every page sequentially, marking irrelevant ones, teaching a pathological looping algorithm. The v2 redesign (bounded top-K, relevance-ordered, no irrelevant markers) eliminates the failure mode and yields substantial gains across all primary metrics.


Related

Resource Description
OriOn-Qwen Base OriOn checkpoint (LongPO, no reasoning)
OriOn-Mistral Mistral variant with +16.8% MMLBD improvement
MMLBD-C Manually corrected MMLongBenchDoc benchmark
Pipeline Code Synthetic reasoning pipeline (Apache 2.0 fork of distilabel)

Benchmarks

Official MMLongBenchDoc leaderboard

Model Acc Params
OriOn-Qwen-SR1 (this model) 58.3 32B
Qwen3-VL-235B-A22B-Instruct 57.0 235B (22B active)
Qwen3-VL-235B-A22B-Thinking 56.2 235B (22B active)
TeleMM-2.0 56.1
Qwen3-VL-32B-Instruct 55.4 32B
GLM-4.6V 54.9 106B (12B active)
GPT-4o 46.3

Full benchmark suite (Qwen3-VL family)

Deltas are relative to the Qwen3-VL-32B-Instruct base model.

Model VA LCA MMLBD MMLBD-C MMLB 128K SlideVQA HELMET DUDE
235B-A22B-Instruct 98.4 98.5 54.8 56.2 78.6 84.5 67.6 59.1
OriOn-Qwen-SR1 (this model) 95.0 (+1.3) 94.4 (+2.3) 55.8 (+4.0) 58.2 (+4.4) 75.7 (+5.3) 75.4 (-1.8) 68.5 (+5.5) 55.1 (-6.7)
LongPO (OriOn-Qwen) 94.0 (+0.3) 92.4 (+0.3) 53.6 (+1.8) 56.4 (+2.6) 75.6 (+5.2) 75.5 (-1.7) 62.9 (-0.1) 56.0 (-5.8)
32B-Instruct (base) 93.7 92.1 51.8 53.8 70.4 77.2 63.0 61.8

VA = Visual-LC Average (MMLBD, MMLBD-C, MMLongBench, DUDE, SlideVQA). LCA = VA + HELMET + LongBench v2. See the paper for full results including Mistral, control-token ablations and trace-design comparisons.


Reasoning Behavior

Place <cot> at the beginning of the system prompt to activate internalized reasoning. This improves performance with only a slight increase in output tokens.

System: <cot>
User: What is the average revenue growth across all subsidiaries mentioned in pages 12-45?

Without <cot>, the model still works but performance degrades (e.g. -3.8 MMLBD for Qwen). The model does not emit <think> tokens at α=0.25; the reasoning is internalized.


Intended Use

This checkpoint is designed for:

  • Long PDF and slide-deck question answering (up to 250+ pages in a single pass)
  • Multi-page document reasoning requiring cross-page synthesis
  • Long-context visual document understanding in enterprise, legal, scientific and financial domains

This is a research checkpoint that retains most of Qwen/Qwen3-VL-32B-Instruct's general capabilities while significantly improving long-document performance.


Usage with Transformers

This model uses the same API as Qwen/Qwen3-VL-32B-Instruct:

import torch
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor

model_id = "lightonai/OriOn-Qwen-SR1"
device = "cuda" if torch.cuda.is_available() else "cpu"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

# Multi-page document QA with <cot> reasoning
messages = [
    {"role": "system", "content": "<cot>"},
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "page1.png"},
            {"type": "image", "url": "page2.png"},
            # ... add all document pages
            {"type": "text", "text": "What are the key findings discussed across this document?"},
        ],
    },
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(device)

output_ids = model.generate(**inputs, max_new_tokens=4096)
generated_ids = output_ids[0, inputs["input_ids"].shape[1]:]
print(processor.decode(generated_ids, skip_special_tokens=True))

Usage with vLLM

vllm serve lightonai/OriOn-Qwen-SR1
import base64
import io
import requests
import pypdfium2 as pdfium

ENDPOINT = "http://localhost:8000/v1/chat/completions"
MODEL = "lightonai/OriOn-Qwen-SR1"

# Load and render a multi-page PDF
pdf_data = requests.get("https://arxiv.org/pdf/2412.13663").content
pdf = pdfium.PdfDocument(pdf_data)

# Convert pages to base64 images
page_images = []
for i in range(min(len(pdf), 50)):  # cap at 50 pages for this example
    pil_image = pdf[i].render(scale=2.77).to_pil()
    buffer = io.BytesIO()
    pil_image.save(buffer, format="PNG")
    b64 = base64.b64encode(buffer.getvalue()).decode("utf-8")
    page_images.append({
        "type": "image_url",
        "image_url": {"url": f"data:image/png;base64,{b64}"},
    })

payload = {
    "model": MODEL,
    "messages": [
        {"role": "system", "content": "<cot>"},
        {
            "role": "user",
            "content": [
                *page_images,
                {"type": "text", "text": "Summarize the main contributions of this paper."},
            ],
        },
    ],
    "max_tokens": 4096,
    "temperature": 0.2,
}

response = requests.post(ENDPOINT, json=payload)
print(response.json()["choices"][0]["message"]["content"])

Model Details

Base model Qwen/Qwen3-VL-32B-Instruct
Architecture Qwen3VLForConditionalGeneration
Context length 262,144 tokens
Tensor type bfloat16
Processor Qwen3VLProcessor / AutoProcessor
Image processor Qwen2VLImageProcessorFast
Training SFT on 50K synthetic reasoning examples + external SFT data (Luth, Smoltalk2)
Merge strength α = 0.25 (task arithmetic with CPT + SFT vectors)
Compute ~40K H100 hours (main training), ~100K H100 hours (project total incl. eval and data gen)

License

Apache License 2.0


Citation

If you use this checkpoint, please cite both papers:

@misc{long_document_internalized_reasoning,
  title={Internalized Reasoning for Long-Context Visual Document Understanding}, 
  author={Austin Veselka},
  year={2026},
  eprint={2604.02371},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2604.02371}, 
}

@misc{long_document_training,
  title={How to Train Your Long-Context Visual Document Model},
  author={Austin Veselka},
  year={2026},
  eprint={2602.15257},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2602.15257},
}

EU

Downloads last month
56
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lightonai/OriOn-Qwen-SR1

Finetuned
(24)
this model

Collection including lightonai/OriOn-Qwen-SR1

Papers for lightonai/OriOn-Qwen-SR1