GeoTikzBridge-Base-8B

Model Overview

GeoTikzBridge-Base-8B is a multimodal large language model proposed in the CVPR 2026 accepted paper GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning, specializing in geometric figure perception and end-to-end TikZ code generation. Built on the InternVL multimodal architecture, this model is fully fine-tuned on the 2.5M-scale GeoTikz-Base dataset, enabling accurate conversion of geometric images into directly compilable, standard-compliant TikZ code, with end-to-end optimization for fine-grained geometric perception, spatial relationship reasoning, and structured code generation.

Model Details

Core Architecture

Backbone Foundation: Built on the InternVL2 8B (InternLM2 7B backbone) multimodal large language model, inheriting the state-of-the-art vision-language alignment capability and code generation proficiency of the InternVL series.
Parameter Scale: 8 billion parameters, balancing inference efficiency and generation accuracy, supporting deployment and implementation on consumer-grade GPUs.
Targeted Optimization: Specialized alignment for fine-grained features of geometric figures (including lines, angles, annotations, coordinate relationships, etc.), significantly improving the restoration accuracy of geometric structures and the compilability of generated TikZ code.

Core Capabilities

High-precision geometric image-to-TikZ code conversion, with output code directly renderable in LaTeX environments to generate vector geometric figures highly consistent with the input image.
Fine-grained geometric element perception, supporting complete restoration of complex planar geometric figures including points, lines, planes, angle annotations, dimension labels, and nested mathematical formulas.
Standardized code output, with generated TikZ code complying with LaTeX typesetting specifications, featuring clear structure and easy secondary modification.

Intended Use & Limitations

Intended Use Cases

Core Scenarios: Vectorization of geometric illustrations for academic papers, generation of geometric materials for textbooks/courseware, auxiliary tools for geometry teaching, and codified conversion of planar engineering drawings.
Research Purposes: Serves as a baseline model for research in geometric perception and multimodal code generation, supporting secondary development and academic research in related fields.
Downstream Expansion: Can be further fine-tuned on this model to expand advanced capabilities such as instruction following, auxiliary line generation, and geometry problem solving (corresponding to the Instruct series models of this project).

Out-of-Scope Use Cases

Code generation for non-geometric images (e.g., natural landscapes, portraits, complex 3D modeling, unstructured hand-drawn doodles, etc.).
Generation of extremely complex 3D geometric figures and high-precision engineering assembly drawings beyond the capability of TikZ planar drawing.
Drawing generation for high-risk industrial/engineering scenarios without compilation and validation.

Model Limitations

The TikZ code generated by the model requires compilation and validation via LaTeX; locally structural deviations may occur for extremely complex composite geometric figures.
The model only supports annotation generation in English and mathematical symbols, and does not support text annotation content in other languages for now.
Generation performance is significantly affected by the clarity and standardization of the input image; blurry, distorted, or severely occluded input images may lead to degraded generation quality.

Quick Start

Environment Setup

Install basic dependencies:

pip install transformers torch pillow accelerate

For full training/inference dependencies, please refer to the official project repository: GeoTikzBridge GitHub

Inference Example

Quickly load the model with the Transformers library to achieve end-to-end image-to-TikZ code generation:

from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image

# Load model and processor
model_name = "SJY-1995/GeoTikzBridge-Base-8B"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

# Load input geometric image
image = Image.open("your_geometric_figure.png").convert("RGB")

# Build inference prompt
prompt = ""
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)

# Generate TikZ code
with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=4096,
        temperature=0.1,
        top_p=0.95,
        do_sample=False
    )

# Decode and output the generated result
tikz_code = processor.decode(output[0], skip_special_tokens=True)
print("Generated TikZ Code:\n", tikz_code)

Training Details

Training Dataset

The model is fully fine-tuned on the GeoTikz-Base dataset, which contains approximately 2.5 million high-quality geometric image-TikZ code paired samples. The dataset covers full scenarios including basic planar geometric figures, composite figures, nested annotations, and formula integration, building a precise vision-code aligned corpus.

Dataset Link: SJY-1995/GeoTikz-Base

Key Training Hyperparameters

Hyperparameter	Configuration
Global Batch Size	128
Peak Learning Rate	4e-7
Training Epochs	3
Max Sequence Length	12800
Training Precision	BF16

Training Framework & Scripts

Model training is implemented based on the official InternVL training framework. The core fine-tuning scripts and complete training pipeline can be found in the project repository:

Core Script Path: ./internvl_chat/shell/internvl2.0/2nd_finetune/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_full.sh
Project Repository: GeoTikzBridge GitHub

Model Family

The GeoTikzBridge series includes multiple models with different specifications and capabilities to adapt to various scenario requirements:

Model Name	Parameter Size	Core Capability	Model Link
GeoTikzBridge-Base-8B	8B	Basic geometric image-to-TikZ code generation	🤗 Hugging Face
GeoTikzBridge-Base-38B	38B	High-precision TikZ code generation for complex geometric figures	🤗 Hugging Face
GeoTikzBridge-Instruct-8B	8B	Instruction following, auxiliary line generation, interactive geometric reasoning	🤗 Hugging Face

Citation

If you use this model, related datasets or code in your research or projects, please cite the following paper:

@inproceedings{
  geotikzbridge,
  title={GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning},
  author={Jiayin Sun and Caixia Sun and Boyu Yang and Hailin Li and Xiao Chen and Yi Zhang and Errui Ding and Liang Li and Chao Deng and Junlan Feng},
  booktitle={2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}

Downloads last month: 90

Safetensors

Model size

8B params

Tensor type

BF16

Dataset used to train SJY-1995/GeoTikzBridge-Base-8B

Evaluation results

CLIP-S on GeoTikz-Base
self-reported

89.500
LaTeX Compilation Success Rate on GeoTikz-Base
self-reported

97.100