GeoTikzBridge-Base-8B
Model Overview
GeoTikzBridge-Base-8B is a multimodal large language model proposed in the CVPR 2026 accepted paper GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning, specializing in geometric figure perception and end-to-end TikZ code generation. Built on the InternVL multimodal architecture, this model is fully fine-tuned on the 2.5M-scale GeoTikz-Base dataset, enabling accurate conversion of geometric images into directly compilable, standard-compliant TikZ code, with end-to-end optimization for fine-grained geometric perception, spatial relationship reasoning, and structured code generation.
Model Details
Core Architecture
- Backbone Foundation: Built on the InternVL2 8B (InternLM2 7B backbone) multimodal large language model, inheriting the state-of-the-art vision-language alignment capability and code generation proficiency of the InternVL series.
- Parameter Scale: 8 billion parameters, balancing inference efficiency and generation accuracy, supporting deployment and implementation on consumer-grade GPUs.
- Targeted Optimization: Specialized alignment for fine-grained features of geometric figures (including lines, angles, annotations, coordinate relationships, etc.), significantly improving the restoration accuracy of geometric structures and the compilability of generated TikZ code.
Core Capabilities
- High-precision geometric image-to-TikZ code conversion, with output code directly renderable in LaTeX environments to generate vector geometric figures highly consistent with the input image.
- Fine-grained geometric element perception, supporting complete restoration of complex planar geometric figures including points, lines, planes, angle annotations, dimension labels, and nested mathematical formulas.
- Standardized code output, with generated TikZ code complying with LaTeX typesetting specifications, featuring clear structure and easy secondary modification.
Intended Use & Limitations
Intended Use Cases
- Core Scenarios: Vectorization of geometric illustrations for academic papers, generation of geometric materials for textbooks/courseware, auxiliary tools for geometry teaching, and codified conversion of planar engineering drawings.
- Research Purposes: Serves as a baseline model for research in geometric perception and multimodal code generation, supporting secondary development and academic research in related fields.
- Downstream Expansion: Can be further fine-tuned on this model to expand advanced capabilities such as instruction following, auxiliary line generation, and geometry problem solving (corresponding to the Instruct series models of this project).
Out-of-Scope Use Cases
- Code generation for non-geometric images (e.g., natural landscapes, portraits, complex 3D modeling, unstructured hand-drawn doodles, etc.).
- Generation of extremely complex 3D geometric figures and high-precision engineering assembly drawings beyond the capability of TikZ planar drawing.
- Drawing generation for high-risk industrial/engineering scenarios without compilation and validation.
Model Limitations
- The TikZ code generated by the model requires compilation and validation via LaTeX; locally structural deviations may occur for extremely complex composite geometric figures.
- The model only supports annotation generation in English and mathematical symbols, and does not support text annotation content in other languages for now.
- Generation performance is significantly affected by the clarity and standardization of the input image; blurry, distorted, or severely occluded input images may lead to degraded generation quality.
Quick Start
Environment Setup
Install basic dependencies:
pip install transformers torch pillow accelerate
For full training/inference dependencies, please refer to the official project repository: GeoTikzBridge GitHub
Inference Example
Quickly load the model with the Transformers library to achieve end-to-end image-to-TikZ code generation:
from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image
# Load model and processor
model_name = "SJY-1995/GeoTikzBridge-Base-8B"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
# Load input geometric image
image = Image.open("your_geometric_figure.png").convert("RGB")
# Build inference prompt
prompt = ""
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
# Generate TikZ code
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=4096,
temperature=0.1,
top_p=0.95,
do_sample=False
)
# Decode and output the generated result
tikz_code = processor.decode(output[0], skip_special_tokens=True)
print("Generated TikZ Code:\n", tikz_code)
Training Details
Training Dataset
The model is fully fine-tuned on the GeoTikz-Base dataset, which contains approximately 2.5 million high-quality geometric image-TikZ code paired samples. The dataset covers full scenarios including basic planar geometric figures, composite figures, nested annotations, and formula integration, building a precise vision-code aligned corpus.
Dataset Link: SJY-1995/GeoTikz-Base
Key Training Hyperparameters
| Hyperparameter | Configuration |
|---|---|
| Global Batch Size | 128 |
| Peak Learning Rate | 4e-7 |
| Training Epochs | 3 |
| Max Sequence Length | 12800 |
| Training Precision | BF16 |
Training Framework & Scripts
Model training is implemented based on the official InternVL training framework. The core fine-tuning scripts and complete training pipeline can be found in the project repository:
- Core Script Path:
./internvl_chat/shell/internvl2.0/2nd_finetune/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_full.sh - Project Repository: GeoTikzBridge GitHub
Model Family
The GeoTikzBridge series includes multiple models with different specifications and capabilities to adapt to various scenario requirements:
| Model Name | Parameter Size | Core Capability | Model Link |
|---|---|---|---|
| GeoTikzBridge-Base-8B | 8B | Basic geometric image-to-TikZ code generation | 🤗 Hugging Face |
| GeoTikzBridge-Base-38B | 38B | High-precision TikZ code generation for complex geometric figures | 🤗 Hugging Face |
| GeoTikzBridge-Instruct-8B | 8B | Instruction following, auxiliary line generation, interactive geometric reasoning | 🤗 Hugging Face |
Citation
If you use this model, related datasets or code in your research or projects, please cite the following paper:
@inproceedings{
geotikzbridge,
title={GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning},
author={Jiayin Sun and Caixia Sun and Boyu Yang and Hailin Li and Xiao Chen and Yi Zhang and Errui Ding and Liang Li and Chao Deng and Junlan Feng},
booktitle={2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}
- Downloads last month
- 90
Dataset used to train SJY-1995/GeoTikzBridge-Base-8B
Evaluation results
- CLIP-S on GeoTikz-Baseself-reported89.500
- LaTeX Compilation Success Rate on GeoTikz-Baseself-reported97.100