LumiChats v1.2 7B - Vision-Language Model for LaTeX OCR

LumiChats License Base Model Fine-tuned

Vision-language model fine-tuned for converting handwritten mathematical formulas to LaTeX

🚀 Try LumiChats Cloud | 📚 Documentation


🌟 Model Overview

LumiChats v1.2 7B is a specialized vision-language model built on Qwen2.5-VL-7B-Instruct, fine-tuned using LoRA for Image-to-LaTeX OCR. This model excels at converting handwritten mathematical formulas from images into properly formatted LaTeX code.

Primary Use Case: Mathematical OCR

  • 📐 Handwritten Formula Recognition - Converts images of mathematical equations to LaTeX
  • 🧮 Symbol Detection - Recognizes complex mathematical symbols (∫, ∂, β, ζ, etc.)
  • ✍️ Handwriting Robustness - Handles variations in handwriting styles
  • 🎯 High Accuracy - Domain-adapted for mathematical notation
  • Fast Inference - 4-bit quantized for efficient processing

Key Specifications

Feature Value
Base Model Qwen2.5-VL-7B-Instruct
Parameters ~7B (vision + language)
Training Method LoRA (r=16, alpha=16)
Trainable Params 51.5M (0.62% of total)
Quantization 4-bit (bnb-4bit)
Dataset unsloth/LaTeX_OCR (68,686 samples)
Training Time 3.27 minutes (30 steps on Tesla T4)
Peak Memory 0.674 GB for training
Task Image-to-LaTeX conversion

🏢 About LumiChats

LumiChats is a student-first AI platform that provides access to 39+ premium and open-source AI models at ₹69/day (pay-only-when-you-use pricing). Our mission is to democratize AI education and make powerful language models accessible to students, developers, and creators without expensive subscriptions.

Why LumiChats?

  • Pay-Per-Day Pricing - Only ₹69 on days you use AI (vs ₹5,900/month for ChatGPT + Claude + Gemini subscriptions)
  • 39+ AI Models - Switch between GPT-4, Claude, Gemini, Qwen, DeepSeek, Mistral instantly
  • Study Mode - Page-by-page PDF learning, custom quizzes, note generation
  • Memory Control - Selective context activation for focused learning
  • 5M Tokens Daily - Generous usage limits for intensive study sessions

Average student cost: ₹690/month (10 active days) vs ₹5,900 for competitor subscriptions → 88% savings


🚀 Model Architecture

Base Model: Qwen2.5-VL-7B-Instruct

Built on Qwen2.5 Vision-Language architecture, combining:

  • Vision Encoder - Processes images and extracts visual features
  • Language Model - 7B parameter transformer for text generation
  • Multimodal Fusion - Integrates visual and textual information

Core Capabilities:

  • Multimodal understanding (image + text)
  • Visual reasoning and pattern recognition
  • Structured text generation (LaTeX, code, markdown)
  • Instruction following for complex tasks

4-bit Quantization Impact:

  • 70% memory reduction - Runs on GPUs with limited VRAM (T4, RTX 3060)
  • Faster inference - Optimized kernels for 4-bit operations
  • Minimal accuracy loss - Modern quantization preserves model quality

Fine-Tuning with LoRA

Method: LoRA (Low-Rank Adaptation) - Parameter-efficient fine-tuning

LoRA Configuration:
- r (rank): 16
- lora_alpha: 16
- lora_dropout: 0.0
- bias: "none"
- finetune_vision_layers: True
- finetune_language_layers: True
- finetune_attention_modules: True
- finetune_mlp_modules: True
- trainable_parameters: 51,521,536 / 8,343,688,192 (0.62%)

Selective Component Fine-tuning:

  • Vision Layers - Adapts image feature extraction for mathematical notation
  • Language Layers - Optimizes LaTeX generation and formatting
  • Attention Modules - Improves symbol-to-text mapping
  • MLP Layers - Enhances complex pattern recognition

Dataset: unsloth/LaTeX_OCR

  • 68,686 samples of handwritten formulas with LaTeX ground truth
  • Conversational format: User (image + instruction) → Assistant (LaTeX output)
  • Covers diverse mathematical notation: integrals, derivatives, fractions, Greek symbols

Training Configuration:

- per_device_train_batch_size: 2
- gradient_accumulation_steps: 4 (effective batch size = 8)
- max_steps: 30
- learning_rate: 2e-4
- optimizer: adamw_8bit
- lr_scheduler: linear decay
- warmup_steps: 5

📊 Performance: Base vs Fine-tuned

Example: Handwritten Formula OCR

Input Image: Complex mathematical formula with integrals, derivatives, Greek symbols

Model Output LaTeX Accuracy
Base Model (before fine-tuning) H^\prime = \beta N \int d\lambda \left\{ \frac{1}{2B^2N^{2}} \partial_\lambda\zeta^\dagger\partial_\lambda\zeta + V(\lambda)\zeta^\dagger\zeta \right\} ❌ Incorrect symbols
Fine-tuned Model H ^ { \prime } = \beta N \int d \lambda \left\{ { \frac { 1 } { 2 \beta ^ { 2 } P N ^ { 2 } } } \partial _ { s } \zeta ^ { \dagger } \partial _ { s } \zeta + V ( \lambda ) \zeta ^ { \dagger } \zeta \right\} ✅ Correct formatting

Key Improvements:

  • ✅ Corrected denominator: 2B^2N^{2}2 \beta ^ { 2 } P N ^ { 2 }
  • ✅ Fixed partial derivatives: \partial_\lambda\partial _ { s }
  • ✅ Better spacing and LaTeX style adherence
  • ✅ Proper delimiter usage (\left\{, \right\})

Why Fine-tuning Matters

Domain Adaptation: The base Qwen2.5-VL model is general-purpose, but lacks specialized knowledge of:

  • Mathematical handwriting variations
  • LaTeX syntax conventions
  • Symbol-to-code mapping for complex formulas

After fine-tuning on 68K LaTeX OCR examples, the model learns:

  • Precise character recognition in mathematical context
  • Correct LaTeX formatting rules
  • Robust handling of handwriting ambiguities

Efficiency Gains (Unsloth + LoRA)

Metric Full Fine-tuning LoRA Fine-tuning Savings
Trainable Parameters 8.3B (100%) 51.5M (0.62%) 99.4% reduction
Training Memory ~12-14 GB 0.674 GB 95% reduction
Training Time Hours-Days 3.27 minutes 100x+ faster
Storage Full model (~28 GB) LoRA adapters (~200 MB) 99% smaller

💻 Usage

Quick Start (Transformers + Unsloth)

from unsloth import FastVisionModel
import torch

model_name = "lumichats/lumichats-v1.2-7b-bnb-4bit"

# Load model
model, tokenizer = FastVisionModel.from_pretrained(
    model_name,
    load_in_4bit=True,
    use_gradient_checkpointing="unsloth",
)

# Prepare for inference
FastVisionModel.for_inference(model)

# Load image
from PIL import Image
image = Image.open("handwritten_formula.png")

# Create prompt
instruction = "Write the LaTeX representation for this image."
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": instruction}
    ]}
]

# Tokenize
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to("cuda")

# Generate LaTeX
outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=1.5,
    min_p=0.1,
    use_cache=True
)

latex_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(latex_output)

Using Standard Transformers

from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image

model_name = "lumichats/lumichats-v1.2-7b-bnb-4bit"

# Load model and processor
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForVision2Seq.from_pretrained(
    model_name,
    device_map="auto",
    load_in_4bit=True,
    trust_remote_code=True
)

# Prepare inputs
image = Image.open("math_formula.png")
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Write the LaTeX representation for this image."}
    ]}
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(images=image, text=text, return_tensors="pt").to("cuda")

# Generate
output = model.generate(**inputs, max_new_tokens=256)
latex = processor.decode(output[0], skip_special_tokens=True)
print(latex)

🎯 Generation Parameters

Recommended Settings for LaTeX OCR

model.generate(
    **inputs,
    max_new_tokens=128,      # Limit output length
    temperature=1.5,         # Balanced creativity/accuracy
    min_p=0.1,              # Filter low-probability tokens
    use_cache=True,         # Faster inference
    do_sample=True          # Enable sampling
)

Parameter Explanations:

  • temperature=1.5: Allows flexibility for handwriting variations while maintaining accuracy
  • min_p=0.1: Ensures only high-probability tokens (prevents hallucinations)
  • max_new_tokens=128: Sufficient for most mathematical formulas
  • use_cache=True: Speeds up autoregressive generation

⚙️ Technical Specifications

Model Configuration

Base Model: Qwen2.5-VL-7B-Instruct
Architecture: Vision-Language Transformer
  
Vision Encoder:
  - Processes images of handwritten math
  - Extracts visual features for symbols
  
Language Model:
  - Parameters: ~7B
  - Generates LaTeX code
  - Context: Up to 2048 tokens for formulas
  
Quantization:
  - Method: bitsandbytes 4-bit NF4
  - Compute dtype: bfloat16 (if supported)
  
LoRA Adapters:
  - Rank: 16
  - Alpha: 16
  - Trainable: 51.5M parameters (0.62%)

System Requirements

Configuration Minimum Recommended
GPU VRAM 6GB (4-bit) 8GB+
RAM 8GB 16GB+
Storage 10GB 20GB
CUDA 11.8+ 12.1+
Python 3.8+ 3.10+

Supported Formats

  • Safetensors (recommended for HuggingFace)
  • GGUF (Q4_K_M for llama.cpp - CPU inference)
  • LoRA Adapters (merge with base model)
  • FP16 merged (for vLLM deployment)

📦 Installation

# Core dependencies
pip install torch transformers accelerate bitsandbytes

# For Unsloth (2x faster training/inference)
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

# For image processing
pip install pillow

🔧 Advanced: Fine-tuning on Your Data

Want to adapt this model for other OCR tasks (e.g., printed text, diagrams)?

from unsloth import FastVisionModel
from trl import SFTTrainer, SFTConfig
from unsloth.trainer import UnslothVisionDataCollator

# Load base model
model, tokenizer = FastVisionModel.from_pretrained(
    "lumichats/lumichats-v1.2-7b-bnb-4bit",
    load_in_4bit=True,
)

# Apply LoRA for further fine-tuning
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers=True,
    finetune_language_layers=True,
    finetune_attention_modules=True,
    finetune_mlp_modules=True,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
)

# Prepare your dataset in messages format
# [{"messages": [{"role": "user", "content": [...]}, {"role": "assistant", "content": [...]}]}]

# Train
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=your_dataset,
    data_collator=UnslothVisionDataCollator(model, tokenizer),
    args=SFTConfig(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=100,
        learning_rate=2e-4,
        optim="adamw_8bit",
        output_dir="outputs",
        dataset_kwargs={"skip_prepare_dataset": True},
    ),
)

trainer.train()

📚 Cite This Model

@misc{lumichats_v1.2_2026,
  title={LumiChats v1.2: Fine-tuned Qwen2.5-7B for Educational AI},
  author={LumiChats Team},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/lumichats/lumichats-v1.2-7b-bnb-4bit}},
}

⚖️ License & Usage

Model License

This model is released under the Apache 2.0 License, allowing:

  • ✅ Commercial use
  • ✅ Modification and distribution
  • ✅ Private use
  • ✅ Patent use

Base Model License

Inherits from Qwen2.5 (Apache 2.0) - see Qwen License

Ethical Use Guidelines

Please use this model responsibly:

  • ❌ Do not generate harmful, illegal, or discriminatory content
  • ❌ Do not impersonate real individuals
  • ✅ Verify factual outputs (models can hallucinate)
  • ✅ Respect user privacy and data protection laws

Built with ❤️ for students, developers, and creators worldwide

Start Using LumiChats Cloud →

Only ₹69/day • No subscriptions • All AI models included

Downloads last month
45
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for adityakum667388/lumichats-v1.2-7b-bnb-4bit

Adapter
(5)
this model

Dataset used to train adityakum667388/lumichats-v1.2-7b-bnb-4bit