LumiChats v1.2 7B - Vision-Language Model for LaTeX OCR

Vision-language model fine-tuned for converting handwritten mathematical formulas to LaTeX

🌟 Model Overview

LumiChats v1.2 7B is a specialized vision-language model built on Qwen2.5-VL-7B-Instruct, fine-tuned using LoRA for Image-to-LaTeX OCR. This model excels at converting handwritten mathematical formulas from images into properly formatted LaTeX code.

Primary Use Case: Mathematical OCR

📐 Handwritten Formula Recognition - Converts images of mathematical equations to LaTeX
🧮 Symbol Detection - Recognizes complex mathematical symbols (∫, ∂, β, ζ, etc.)
✍️ Handwriting Robustness - Handles variations in handwriting styles
🎯 High Accuracy - Domain-adapted for mathematical notation
⚡ Fast Inference - 4-bit quantized for efficient processing

Key Specifications

Feature	Value
Base Model	Qwen2.5-VL-7B-Instruct
Parameters	~7B (vision + language)
Training Method	LoRA (r=16, alpha=16)
Trainable Params	51.5M (0.62% of total)
Quantization	4-bit (bnb-4bit)
Dataset	unsloth/LaTeX_OCR (68,686 samples)
Training Time	3.27 minutes (30 steps on Tesla T4)
Peak Memory	0.674 GB for training
Task	Image-to-LaTeX conversion

🏢 About LumiChats

LumiChats is a student-first AI platform that provides access to 39+ premium and open-source AI models at ₹69/day (pay-only-when-you-use pricing). Our mission is to democratize AI education and make powerful language models accessible to students, developers, and creators without expensive subscriptions.

Why LumiChats?

✅ Pay-Per-Day Pricing - Only ₹69 on days you use AI (vs ₹5,900/month for ChatGPT + Claude + Gemini subscriptions)
✅ 39+ AI Models - Switch between GPT-4, Claude, Gemini, Qwen, DeepSeek, Mistral instantly
✅ Study Mode - Page-by-page PDF learning, custom quizzes, note generation
✅ Memory Control - Selective context activation for focused learning
✅ 5M Tokens Daily - Generous usage limits for intensive study sessions

Average student cost: ₹690/month (10 active days) vs ₹5,900 for competitor subscriptions → 88% savings

🚀 Model Architecture

Base Model: Qwen2.5-VL-7B-Instruct

Built on Qwen2.5 Vision-Language architecture, combining:

Vision Encoder - Processes images and extracts visual features
Language Model - 7B parameter transformer for text generation
Multimodal Fusion - Integrates visual and textual information

Core Capabilities:

Multimodal understanding (image + text)
Visual reasoning and pattern recognition
Structured text generation (LaTeX, code, markdown)
Instruction following for complex tasks

4-bit Quantization Impact:

✅ 70% memory reduction - Runs on GPUs with limited VRAM (T4, RTX 3060)
✅ Faster inference - Optimized kernels for 4-bit operations
✅ Minimal accuracy loss - Modern quantization preserves model quality

Fine-Tuning with LoRA

Method: LoRA (Low-Rank Adaptation) - Parameter-efficient fine-tuning

LoRA Configuration:
- r (rank): 16
- lora_alpha: 16
- lora_dropout: 0.0
- bias: "none"
- finetune_vision_layers: True
- finetune_language_layers: True
- finetune_attention_modules: True
- finetune_mlp_modules: True
- trainable_parameters: 51,521,536 / 8,343,688,192 (0.62%)

Selective Component Fine-tuning:

✅ Vision Layers - Adapts image feature extraction for mathematical notation
✅ Language Layers - Optimizes LaTeX generation and formatting
✅ Attention Modules - Improves symbol-to-text mapping
✅ MLP Layers - Enhances complex pattern recognition

Dataset: unsloth/LaTeX_OCR

68,686 samples of handwritten formulas with LaTeX ground truth
Conversational format: User (image + instruction) → Assistant (LaTeX output)
Covers diverse mathematical notation: integrals, derivatives, fractions, Greek symbols

Training Configuration:

- per_device_train_batch_size: 2
- gradient_accumulation_steps: 4 (effective batch size = 8)
- max_steps: 30
- learning_rate: 2e-4
- optimizer: adamw_8bit
- lr_scheduler: linear decay
- warmup_steps: 5

📊 Performance: Base vs Fine-tuned

Example: Handwritten Formula OCR

Input Image: Complex mathematical formula with integrals, derivatives, Greek symbols

Model	Output LaTeX	Accuracy
Base Model (before fine-tuning)	`H^\prime = \beta N \int d\lambda \left\{ \frac{1}{2B^2N^{2}} \partial_\lambda\zeta^\dagger\partial_\lambda\zeta + V(\lambda)\zeta^\dagger\zeta \right\}`	❌ Incorrect symbols
Fine-tuned Model	`H ^ { \prime } = \beta N \int d \lambda \left\{ { \frac { 1 } { 2 \beta ^ { 2 } P N ^ { 2 } } } \partial _ { s } \zeta ^ { \dagger } \partial _ { s } \zeta + V ( \lambda ) \zeta ^ { \dagger } \zeta \right\}`	✅ Correct formatting

Key Improvements:

✅ Corrected denominator: 2B^2N^{2} → 2 \beta ^ { 2 } P N ^ { 2 }
✅ Fixed partial derivatives: \partial_\lambda → \partial _ { s }
✅ Better spacing and LaTeX style adherence
✅ Proper delimiter usage (\left\{, \right\})

Why Fine-tuning Matters

Domain Adaptation: The base Qwen2.5-VL model is general-purpose, but lacks specialized knowledge of:

Mathematical handwriting variations
LaTeX syntax conventions
Symbol-to-code mapping for complex formulas

After fine-tuning on 68K LaTeX OCR examples, the model learns:

Precise character recognition in mathematical context
Correct LaTeX formatting rules
Robust handling of handwriting ambiguities

Efficiency Gains (Unsloth + LoRA)

Metric	Full Fine-tuning	LoRA Fine-tuning	Savings
Trainable Parameters	8.3B (100%)	51.5M (0.62%)	99.4% reduction
Training Memory	~12-14 GB	0.674 GB	95% reduction
Training Time	Hours-Days	3.27 minutes	100x+ faster
Storage	Full model (~28 GB)	LoRA adapters (~200 MB)	99% smaller

💻 Usage

Quick Start (Transformers + Unsloth)

from unsloth import FastVisionModel
import torch

model_name = "lumichats/lumichats-v1.2-7b-bnb-4bit"

# Load model
model, tokenizer = FastVisionModel.from_pretrained(
    model_name,
    load_in_4bit=True,
    use_gradient_checkpointing="unsloth",
)

# Prepare for inference
FastVisionModel.for_inference(model)

# Load image
from PIL import Image
image = Image.open("handwritten_formula.png")

# Create prompt
instruction = "Write the LaTeX representation for this image."
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": instruction}
    ]}
]

# Tokenize
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to("cuda")

# Generate LaTeX
outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=1.5,
    min_p=0.1,
    use_cache=True
)

latex_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(latex_output)

Using Standard Transformers

from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image

model_name = "lumichats/lumichats-v1.2-7b-bnb-4bit"

# Load model and processor
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForVision2Seq.from_pretrained(
    model_name,
    device_map="auto",
    load_in_4bit=True,
    trust_remote_code=True
)

# Prepare inputs
image = Image.open("math_formula.png")
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Write the LaTeX representation for this image."}
    ]}
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(images=image, text=text, return_tensors="pt").to("cuda")

# Generate
output = model.generate(**inputs, max_new_tokens=256)
latex = processor.decode(output[0], skip_special_tokens=True)
print(latex)

🎯 Generation Parameters

Recommended Settings for LaTeX OCR

model.generate(
    **inputs,
    max_new_tokens=128,      # Limit output length
    temperature=1.5,         # Balanced creativity/accuracy
    min_p=0.1,              # Filter low-probability tokens
    use_cache=True,         # Faster inference
    do_sample=True          # Enable sampling
)

Parameter Explanations:

temperature=1.5: Allows flexibility for handwriting variations while maintaining accuracy
min_p=0.1: Ensures only high-probability tokens (prevents hallucinations)
max_new_tokens=128: Sufficient for most mathematical formulas
use_cache=True: Speeds up autoregressive generation

⚙️ Technical Specifications

Model Configuration

Base Model: Qwen2.5-VL-7B-Instruct
Architecture: Vision-Language Transformer
  
Vision Encoder:
  - Processes images of handwritten math
  - Extracts visual features for symbols
  
Language Model:
  - Parameters: ~7B
  - Generates LaTeX code
  - Context: Up to 2048 tokens for formulas
  
Quantization:
  - Method: bitsandbytes 4-bit NF4
  - Compute dtype: bfloat16 (if supported)
  
LoRA Adapters:
  - Rank: 16
  - Alpha: 16
  - Trainable: 51.5M parameters (0.62%)

System Requirements

Configuration	Minimum	Recommended
GPU VRAM	6GB (4-bit)	8GB+
RAM	8GB	16GB+
Storage	10GB	20GB
CUDA	11.8+	12.1+
Python	3.8+	3.10+

Supported Formats

✅ Safetensors (recommended for HuggingFace)
✅ GGUF (Q4_K_M for llama.cpp - CPU inference)
✅ LoRA Adapters (merge with base model)
✅ FP16 merged (for vLLM deployment)

📦 Installation

# Core dependencies
pip install torch transformers accelerate bitsandbytes

# For Unsloth (2x faster training/inference)
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

# For image processing
pip install pillow

🔧 Advanced: Fine-tuning on Your Data

Want to adapt this model for other OCR tasks (e.g., printed text, diagrams)?

from unsloth import FastVisionModel
from trl import SFTTrainer, SFTConfig
from unsloth.trainer import UnslothVisionDataCollator

# Load base model
model, tokenizer = FastVisionModel.from_pretrained(
    "lumichats/lumichats-v1.2-7b-bnb-4bit",
    load_in_4bit=True,
)

# Apply LoRA for further fine-tuning
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers=True,
    finetune_language_layers=True,
    finetune_attention_modules=True,
    finetune_mlp_modules=True,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
)

# Prepare your dataset in messages format
# [{"messages": [{"role": "user", "content": [...]}, {"role": "assistant", "content": [...]}]}]

# Train
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=your_dataset,
    data_collator=UnslothVisionDataCollator(model, tokenizer),
    args=SFTConfig(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=100,
        learning_rate=2e-4,
        optim="adamw_8bit",
        output_dir="outputs",
        dataset_kwargs={"skip_prepare_dataset": True},
    ),
)

trainer.train()

📚 Cite This Model

@misc{lumichats_v1.2_2026,
  title={LumiChats v1.2: Fine-tuned Qwen2.5-7B for Educational AI},
  author={LumiChats Team},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/lumichats/lumichats-v1.2-7b-bnb-4bit}},
}

⚖️ License & Usage

Model License

This model is released under the Apache 2.0 License, allowing:

✅ Commercial use
✅ Modification and distribution
✅ Private use
✅ Patent use

Base Model License

Inherits from Qwen2.5 (Apache 2.0) - see Qwen License

Ethical Use Guidelines

Please use this model responsibly:

❌ Do not generate harmful, illegal, or discriminatory content
❌ Do not impersonate real individuals
✅ Verify factual outputs (models can hallucinate)
✅ Respect user privacy and data protection laws

Built with ❤️ for students, developers, and creators worldwide

Start Using LumiChats Cloud →

Only ₹69/day • No subscriptions • All AI models included

Downloads last month: 45

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for adityakum667388/lumichats-v1.2-7b-bnb-4bit

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Quantized

unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit

Adapter

(5)

this model

adityakum667388
/

lumichats-v1.2-7b-bnb-4bit