Qwen3-VL-2B-LaTeX-OCR

A 2B parameter vision-language model fine-tuned from Qwen/Qwen3-VL-2B-Instruct for LaTeX formula recognition from images. This model converts mathematical formula images into accurate LaTeX code.

Model Highlights

Specialized for LaTeX OCR: Trained to accurately transcribe mathematical formulas from images to LaTeX
Multi-Format Support: Handles inline formulas, display equations, matrices, and complex multi-line expressions
High Accuracy: Significantly improved formula recognition over the base model
Vision-Language Architecture: Leverages Qwen3-VL's visual understanding capabilities

Model Description

Property	Value
Base Model	Qwen/Qwen3-VL-2B-Instruct
Model Type	Vision-Language Model (Image-to-Text)
Parameters	2B
Language	English
License	Apache 2.0
Developer	Kassadin88

Training Data

Trained on the LaTeX-OCR dataset for mathematical formula image to LaTeX conversion. The dataset contains rendered LaTeX formulas paired with their source LaTeX code.

Data Composition

Type	Description
Inline formulas	Simple expressions like $E = mc^2$
Display equations	Centered equations with equation numbering
Matrices	Matrix and array environments
Multi-line expressions	Aligned, gathered, and cases environments
Complex formulas	Nested fractions, integrals, summations, and tensor notation

Quick Start

Using Transformers

from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image

model_name = "Kassadin88/Qwen3-VL-2B-LaTeX-OCR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype="auto"
)
processor = AutoProcessor.from_pretrained(
    model_name,
    trust_remote_code=True
)

image = Image.open("formula.png")
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Transcribe the formula in the image to LaTeX."}
    ]}
]

inputs = processor.apply_chat_template(
    messages,
    images=[image],
    return_tensors="pt"
)
outputs = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)

Using vLLM (Recommended for Production)

vllm serve Kassadin88/Qwen3-VL-2B-LaTeX-OCR \
    --port 8000 \
    --max-model-len 4096 \
    --trust-remote-code

Usage Tips

For Best Results

Use high-resolution, clean images for best recognition accuracy
Crop images tightly around the formula to reduce background noise
For multi-page documents, process one formula at a time

Example Prompt

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Convert this mathematical formula to LaTeX."}
    ]}
]

Limitations

May struggle with handwritten formulas or low-quality images
Complex multi-line derivations with mixed text and math may require manual review
Not designed for general OCR tasks (text recognition from documents)
Limited to mathematical notation; does not handle chemical equations or circuit diagrams

Citation

@misc{qwen3-vl-2b-latex-ocr,
    author = {Kassadin88},
    title = {Qwen3-VL-2B-LaTeX-OCR: A Fine-Tuned Vision-Language Model for LaTeX OCR},
    year = {2026},
    publisher = {HuggingFace},
    url = {https://huggingface.co/Kassadin88/Qwen3-VL-2B-LaTeX-OCR}
}

Acknowledgments

Base Model: Qwen Team for Qwen3-VL
Training Data: linxy for the LaTeX-OCR dataset

Note: This model is intended for research and educational purposes. Please use responsibly.

Downloads last month: 44

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for Kassadin88/Qwen3-VL-2B-LaTeX-OCR

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

(183)

this model