Qwen3-VL-2B-LaTeX-OCR

A 2B parameter vision-language model fine-tuned from Qwen/Qwen3-VL-2B-Instruct for LaTeX formula recognition from images. This model converts mathematical formula images into accurate LaTeX code.

Model Highlights

  • Specialized for LaTeX OCR: Trained to accurately transcribe mathematical formulas from images to LaTeX
  • Multi-Format Support: Handles inline formulas, display equations, matrices, and complex multi-line expressions
  • High Accuracy: Significantly improved formula recognition over the base model
  • Vision-Language Architecture: Leverages Qwen3-VL's visual understanding capabilities

Model Description

Property Value
Base Model Qwen/Qwen3-VL-2B-Instruct
Model Type Vision-Language Model (Image-to-Text)
Parameters 2B
Language English
License Apache 2.0
Developer Kassadin88

Training Data

Trained on the LaTeX-OCR dataset for mathematical formula image to LaTeX conversion. The dataset contains rendered LaTeX formulas paired with their source LaTeX code.

Data Composition

Type Description
Inline formulas Simple expressions like $E = mc^2$
Display equations Centered equations with equation numbering
Matrices Matrix and array environments
Multi-line expressions Aligned, gathered, and cases environments
Complex formulas Nested fractions, integrals, summations, and tensor notation

Quick Start

Using Transformers

from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image

model_name = "Kassadin88/Qwen3-VL-2B-LaTeX-OCR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype="auto"
)
processor = AutoProcessor.from_pretrained(
    model_name,
    trust_remote_code=True
)

image = Image.open("formula.png")
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Transcribe the formula in the image to LaTeX."}
    ]}
]

inputs = processor.apply_chat_template(
    messages,
    images=[image],
    return_tensors="pt"
)
outputs = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)

Using vLLM (Recommended for Production)

vllm serve Kassadin88/Qwen3-VL-2B-LaTeX-OCR \
    --port 8000 \
    --max-model-len 4096 \
    --trust-remote-code

Usage Tips

For Best Results

  • Use high-resolution, clean images for best recognition accuracy
  • Crop images tightly around the formula to reduce background noise
  • For multi-page documents, process one formula at a time

Example Prompt

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Convert this mathematical formula to LaTeX."}
    ]}
]

Limitations

  • May struggle with handwritten formulas or low-quality images
  • Complex multi-line derivations with mixed text and math may require manual review
  • Not designed for general OCR tasks (text recognition from documents)
  • Limited to mathematical notation; does not handle chemical equations or circuit diagrams

Citation

@misc{qwen3-vl-2b-latex-ocr,
    author = {Kassadin88},
    title = {Qwen3-VL-2B-LaTeX-OCR: A Fine-Tuned Vision-Language Model for LaTeX OCR},
    year = {2026},
    publisher = {HuggingFace},
    url = {https://huggingface.co/Kassadin88/Qwen3-VL-2B-LaTeX-OCR}
}

Acknowledgments

  • Base Model: Qwen Team for Qwen3-VL
  • Training Data: linxy for the LaTeX-OCR dataset

Note: This model is intended for research and educational purposes. Please use responsibly.

Downloads last month
44
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kassadin88/Qwen3-VL-2B-LaTeX-OCR

Finetuned
(183)
this model