Qwen3-VL-2B-LaTeX-OCR
A 2B parameter vision-language model fine-tuned from Qwen/Qwen3-VL-2B-Instruct for LaTeX formula recognition from images. This model converts mathematical formula images into accurate LaTeX code.
Model Highlights
- Specialized for LaTeX OCR: Trained to accurately transcribe mathematical formulas from images to LaTeX
- Multi-Format Support: Handles inline formulas, display equations, matrices, and complex multi-line expressions
- High Accuracy: Significantly improved formula recognition over the base model
- Vision-Language Architecture: Leverages Qwen3-VL's visual understanding capabilities
Model Description
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3-VL-2B-Instruct |
| Model Type | Vision-Language Model (Image-to-Text) |
| Parameters | 2B |
| Language | English |
| License | Apache 2.0 |
| Developer | Kassadin88 |
Training Data
Trained on the LaTeX-OCR dataset for mathematical formula image to LaTeX conversion. The dataset contains rendered LaTeX formulas paired with their source LaTeX code.
Data Composition
| Type | Description |
|---|---|
| Inline formulas | Simple expressions like $E = mc^2$ |
| Display equations | Centered equations with equation numbering |
| Matrices | Matrix and array environments |
| Multi-line expressions | Aligned, gathered, and cases environments |
| Complex formulas | Nested fractions, integrals, summations, and tensor notation |
Quick Start
Using Transformers
from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
model_name = "Kassadin88/Qwen3-VL-2B-LaTeX-OCR"
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype="auto"
)
processor = AutoProcessor.from_pretrained(
model_name,
trust_remote_code=True
)
image = Image.open("formula.png")
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "Transcribe the formula in the image to LaTeX."}
]}
]
inputs = processor.apply_chat_template(
messages,
images=[image],
return_tensors="pt"
)
outputs = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)
Using vLLM (Recommended for Production)
vllm serve Kassadin88/Qwen3-VL-2B-LaTeX-OCR \
--port 8000 \
--max-model-len 4096 \
--trust-remote-code
Usage Tips
For Best Results
- Use high-resolution, clean images for best recognition accuracy
- Crop images tightly around the formula to reduce background noise
- For multi-page documents, process one formula at a time
Example Prompt
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "Convert this mathematical formula to LaTeX."}
]}
]
Limitations
- May struggle with handwritten formulas or low-quality images
- Complex multi-line derivations with mixed text and math may require manual review
- Not designed for general OCR tasks (text recognition from documents)
- Limited to mathematical notation; does not handle chemical equations or circuit diagrams
Citation
@misc{qwen3-vl-2b-latex-ocr,
author = {Kassadin88},
title = {Qwen3-VL-2B-LaTeX-OCR: A Fine-Tuned Vision-Language Model for LaTeX OCR},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/Kassadin88/Qwen3-VL-2B-LaTeX-OCR}
}
Acknowledgments
Note: This model is intended for research and educational purposes. Please use responsibly.
- Downloads last month
- 44
Model tree for Kassadin88/Qwen3-VL-2B-LaTeX-OCR
Base model
Qwen/Qwen3-VL-2B-Instruct