LumiChats v1.2 7B - Vision-Language Model for LaTeX OCR
Vision-language model fine-tuned for converting handwritten mathematical formulas to LaTeX
🌟 Model Overview
LumiChats v1.2 7B is a specialized vision-language model built on Qwen2.5-VL-7B-Instruct, fine-tuned using LoRA for Image-to-LaTeX OCR. This model excels at converting handwritten mathematical formulas from images into properly formatted LaTeX code.
Primary Use Case: Mathematical OCR
- 📐 Handwritten Formula Recognition - Converts images of mathematical equations to LaTeX
- 🧮 Symbol Detection - Recognizes complex mathematical symbols (∫, ∂, β, ζ, etc.)
- ✍️ Handwriting Robustness - Handles variations in handwriting styles
- 🎯 High Accuracy - Domain-adapted for mathematical notation
- ⚡ Fast Inference - 4-bit quantized for efficient processing
Key Specifications
| Feature | Value |
|---|---|
| Base Model | Qwen2.5-VL-7B-Instruct |
| Parameters | ~7B (vision + language) |
| Training Method | LoRA (r=16, alpha=16) |
| Trainable Params | 51.5M (0.62% of total) |
| Quantization | 4-bit (bnb-4bit) |
| Dataset | unsloth/LaTeX_OCR (68,686 samples) |
| Training Time | 3.27 minutes (30 steps on Tesla T4) |
| Peak Memory | 0.674 GB for training |
| Task | Image-to-LaTeX conversion |
🏢 About LumiChats
LumiChats is a student-first AI platform that provides access to 39+ premium and open-source AI models at ₹69/day (pay-only-when-you-use pricing). Our mission is to democratize AI education and make powerful language models accessible to students, developers, and creators without expensive subscriptions.
Why LumiChats?
- ✅ Pay-Per-Day Pricing - Only ₹69 on days you use AI (vs ₹5,900/month for ChatGPT + Claude + Gemini subscriptions)
- ✅ 39+ AI Models - Switch between GPT-4, Claude, Gemini, Qwen, DeepSeek, Mistral instantly
- ✅ Study Mode - Page-by-page PDF learning, custom quizzes, note generation
- ✅ Memory Control - Selective context activation for focused learning
- ✅ 5M Tokens Daily - Generous usage limits for intensive study sessions
Average student cost: ₹690/month (10 active days) vs ₹5,900 for competitor subscriptions → 88% savings
🚀 Model Architecture
Base Model: Qwen2.5-VL-7B-Instruct
Built on Qwen2.5 Vision-Language architecture, combining:
- Vision Encoder - Processes images and extracts visual features
- Language Model - 7B parameter transformer for text generation
- Multimodal Fusion - Integrates visual and textual information
Core Capabilities:
- Multimodal understanding (image + text)
- Visual reasoning and pattern recognition
- Structured text generation (LaTeX, code, markdown)
- Instruction following for complex tasks
4-bit Quantization Impact:
- ✅ 70% memory reduction - Runs on GPUs with limited VRAM (T4, RTX 3060)
- ✅ Faster inference - Optimized kernels for 4-bit operations
- ✅ Minimal accuracy loss - Modern quantization preserves model quality
Fine-Tuning with LoRA
Method: LoRA (Low-Rank Adaptation) - Parameter-efficient fine-tuning
LoRA Configuration:
- r (rank): 16
- lora_alpha: 16
- lora_dropout: 0.0
- bias: "none"
- finetune_vision_layers: True
- finetune_language_layers: True
- finetune_attention_modules: True
- finetune_mlp_modules: True
- trainable_parameters: 51,521,536 / 8,343,688,192 (0.62%)
Selective Component Fine-tuning:
- ✅ Vision Layers - Adapts image feature extraction for mathematical notation
- ✅ Language Layers - Optimizes LaTeX generation and formatting
- ✅ Attention Modules - Improves symbol-to-text mapping
- ✅ MLP Layers - Enhances complex pattern recognition
Dataset: unsloth/LaTeX_OCR
- 68,686 samples of handwritten formulas with LaTeX ground truth
- Conversational format: User (image + instruction) → Assistant (LaTeX output)
- Covers diverse mathematical notation: integrals, derivatives, fractions, Greek symbols
Training Configuration:
- per_device_train_batch_size: 2
- gradient_accumulation_steps: 4 (effective batch size = 8)
- max_steps: 30
- learning_rate: 2e-4
- optimizer: adamw_8bit
- lr_scheduler: linear decay
- warmup_steps: 5
📊 Performance: Base vs Fine-tuned
Example: Handwritten Formula OCR
Input Image: Complex mathematical formula with integrals, derivatives, Greek symbols
| Model | Output LaTeX | Accuracy |
|---|---|---|
| Base Model (before fine-tuning) | H^\prime = \beta N \int d\lambda \left\{ \frac{1}{2B^2N^{2}} \partial_\lambda\zeta^\dagger\partial_\lambda\zeta + V(\lambda)\zeta^\dagger\zeta \right\} |
❌ Incorrect symbols |
| Fine-tuned Model | H ^ { \prime } = \beta N \int d \lambda \left\{ { \frac { 1 } { 2 \beta ^ { 2 } P N ^ { 2 } } } \partial _ { s } \zeta ^ { \dagger } \partial _ { s } \zeta + V ( \lambda ) \zeta ^ { \dagger } \zeta \right\} |
✅ Correct formatting |
Key Improvements:
- ✅ Corrected denominator:
2B^2N^{2}→2 \beta ^ { 2 } P N ^ { 2 } - ✅ Fixed partial derivatives:
\partial_\lambda→\partial _ { s } - ✅ Better spacing and LaTeX style adherence
- ✅ Proper delimiter usage (
\left\{,\right\})
Why Fine-tuning Matters
Domain Adaptation: The base Qwen2.5-VL model is general-purpose, but lacks specialized knowledge of:
- Mathematical handwriting variations
- LaTeX syntax conventions
- Symbol-to-code mapping for complex formulas
After fine-tuning on 68K LaTeX OCR examples, the model learns:
- Precise character recognition in mathematical context
- Correct LaTeX formatting rules
- Robust handling of handwriting ambiguities
Efficiency Gains (Unsloth + LoRA)
| Metric | Full Fine-tuning | LoRA Fine-tuning | Savings |
|---|---|---|---|
| Trainable Parameters | 8.3B (100%) | 51.5M (0.62%) | 99.4% reduction |
| Training Memory | ~12-14 GB | 0.674 GB | 95% reduction |
| Training Time | Hours-Days | 3.27 minutes | 100x+ faster |
| Storage | Full model (~28 GB) | LoRA adapters (~200 MB) | 99% smaller |
💻 Usage
Quick Start (Transformers + Unsloth)
from unsloth import FastVisionModel
import torch
model_name = "lumichats/lumichats-v1.2-7b-bnb-4bit"
# Load model
model, tokenizer = FastVisionModel.from_pretrained(
model_name,
load_in_4bit=True,
use_gradient_checkpointing="unsloth",
)
# Prepare for inference
FastVisionModel.for_inference(model)
# Load image
from PIL import Image
image = Image.open("handwritten_formula.png")
# Create prompt
instruction = "Write the LaTeX representation for this image."
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": instruction}
]}
]
# Tokenize
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(
image,
input_text,
add_special_tokens=False,
return_tensors="pt",
).to("cuda")
# Generate LaTeX
outputs = model.generate(
**inputs,
max_new_tokens=128,
temperature=1.5,
min_p=0.1,
use_cache=True
)
latex_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(latex_output)
Using Standard Transformers
from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image
model_name = "lumichats/lumichats-v1.2-7b-bnb-4bit"
# Load model and processor
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForVision2Seq.from_pretrained(
model_name,
device_map="auto",
load_in_4bit=True,
trust_remote_code=True
)
# Prepare inputs
image = Image.open("math_formula.png")
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "Write the LaTeX representation for this image."}
]}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(images=image, text=text, return_tensors="pt").to("cuda")
# Generate
output = model.generate(**inputs, max_new_tokens=256)
latex = processor.decode(output[0], skip_special_tokens=True)
print(latex)
🎯 Generation Parameters
Recommended Settings for LaTeX OCR
model.generate(
**inputs,
max_new_tokens=128, # Limit output length
temperature=1.5, # Balanced creativity/accuracy
min_p=0.1, # Filter low-probability tokens
use_cache=True, # Faster inference
do_sample=True # Enable sampling
)
Parameter Explanations:
- temperature=1.5: Allows flexibility for handwriting variations while maintaining accuracy
- min_p=0.1: Ensures only high-probability tokens (prevents hallucinations)
- max_new_tokens=128: Sufficient for most mathematical formulas
- use_cache=True: Speeds up autoregressive generation
⚙️ Technical Specifications
Model Configuration
Base Model: Qwen2.5-VL-7B-Instruct
Architecture: Vision-Language Transformer
Vision Encoder:
- Processes images of handwritten math
- Extracts visual features for symbols
Language Model:
- Parameters: ~7B
- Generates LaTeX code
- Context: Up to 2048 tokens for formulas
Quantization:
- Method: bitsandbytes 4-bit NF4
- Compute dtype: bfloat16 (if supported)
LoRA Adapters:
- Rank: 16
- Alpha: 16
- Trainable: 51.5M parameters (0.62%)
System Requirements
| Configuration | Minimum | Recommended |
|---|---|---|
| GPU VRAM | 6GB (4-bit) | 8GB+ |
| RAM | 8GB | 16GB+ |
| Storage | 10GB | 20GB |
| CUDA | 11.8+ | 12.1+ |
| Python | 3.8+ | 3.10+ |
Supported Formats
- ✅ Safetensors (recommended for HuggingFace)
- ✅ GGUF (Q4_K_M for llama.cpp - CPU inference)
- ✅ LoRA Adapters (merge with base model)
- ✅ FP16 merged (for vLLM deployment)
📦 Installation
# Core dependencies
pip install torch transformers accelerate bitsandbytes
# For Unsloth (2x faster training/inference)
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
# For image processing
pip install pillow
🔧 Advanced: Fine-tuning on Your Data
Want to adapt this model for other OCR tasks (e.g., printed text, diagrams)?
from unsloth import FastVisionModel
from trl import SFTTrainer, SFTConfig
from unsloth.trainer import UnslothVisionDataCollator
# Load base model
model, tokenizer = FastVisionModel.from_pretrained(
"lumichats/lumichats-v1.2-7b-bnb-4bit",
load_in_4bit=True,
)
# Apply LoRA for further fine-tuning
model = FastVisionModel.get_peft_model(
model,
finetune_vision_layers=True,
finetune_language_layers=True,
finetune_attention_modules=True,
finetune_mlp_modules=True,
r=16,
lora_alpha=16,
lora_dropout=0,
bias="none",
)
# Prepare your dataset in messages format
# [{"messages": [{"role": "user", "content": [...]}, {"role": "assistant", "content": [...]}]}]
# Train
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=your_dataset,
data_collator=UnslothVisionDataCollator(model, tokenizer),
args=SFTConfig(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=100,
learning_rate=2e-4,
optim="adamw_8bit",
output_dir="outputs",
dataset_kwargs={"skip_prepare_dataset": True},
),
)
trainer.train()
📚 Cite This Model
@misc{lumichats_v1.2_2026,
title={LumiChats v1.2: Fine-tuned Qwen2.5-7B for Educational AI},
author={LumiChats Team},
year={2026},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/lumichats/lumichats-v1.2-7b-bnb-4bit}},
}
⚖️ License & Usage
Model License
This model is released under the Apache 2.0 License, allowing:
- ✅ Commercial use
- ✅ Modification and distribution
- ✅ Private use
- ✅ Patent use
Base Model License
Inherits from Qwen2.5 (Apache 2.0) - see Qwen License
Ethical Use Guidelines
Please use this model responsibly:
- ❌ Do not generate harmful, illegal, or discriminatory content
- ❌ Do not impersonate real individuals
- ✅ Verify factual outputs (models can hallucinate)
- ✅ Respect user privacy and data protection laws
Built with ❤️ for students, developers, and creators worldwide
Only ₹69/day • No subscriptions • All AI models included
- Downloads last month
- 45
Model tree for adityakum667388/lumichats-v1.2-7b-bnb-4bit
Base model
Qwen/Qwen2.5-VL-7B-Instruct