GeoTikzBridge-Base-38B

Model Overview

GeoTikzBridge-Base-38B is the high-capacity flagship model of the GeoTikzBridge series, presented in the CVPR 2026 accepted paper GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning. Built on the state-of-the-art InternVL3.5-38B-Instruct multimodal architecture, this model is fully fine-tuned on the 2.5M-scale GeoTikz-Base dataset, delivering industry-leading performance in complex geometric figure perception and high-precision TikZ code generation. With 38 billion parameters, it achieves exceptional accuracy in fine-grained geometric structure restoration, complex multi-component figure parsing, and long-sequence standardized code generation, setting a new baseline for geometric multimodal code generation tasks.

Model Details

Core Architecture

Backbone Foundation: Built on the InternVL3.5-38B-Instruct large multimodal model, leveraging its advanced vision-language alignment capability, long-context understanding, and robust code generation foundation.
Parameter Scale: 38 billion parameters, optimized for high-precision geometric perception and complex code generation, with significantly enhanced representation learning ability for geometric spatial relationships and structural logic compared to smaller variants.
Targeted Optimization: Specialized pre-training and fine-tuning for ultra-complex planar geometric figures, including multi-layer nested structures, dense annotation systems, multi-formula integration, and composite engineering schematics, with optimized long-sequence code generation stability.

Core Capabilities

Ultra-High Precision Geometric Image-to-TikZ Conversion: Delivers pixel-level accurate restoration of complex geometric figures, with generated TikZ code achieving near-perfect consistency with the input image after LaTeX rendering.
Complex Composite Figure Parsing: Supports end-to-end parsing and code generation for multi-component nested geometric figures, dense engineering schematics, and academic paper illustrations with complex layouts and annotations.
Long-Sequence Stable Code Generation: Maintains excellent syntactic correctness and structural standardization for long TikZ code sequences, with a near-zero syntax error rate for complex figure generation.
Fine-Grained Geometric Detail Restoration: Accurately captures and reproduces tiny geometric details, including precise angle labels, dimensional tolerances, dashed/dotted line styles, and nested mathematical formula annotations.
Strong Cross-Scene Generalization: Maintains outstanding generation performance across diverse scenarios, from basic geometric teaching materials to high-standard academic journal illustrations and professional planar engineering schematics.

Intended Use & Limitations

Intended Use Cases

Core Scenarios: High-quality geometric illustration generation for top-tier academic journals and conference papers, professional textbook and monograph geometric material production, high-precision planar engineering schematic vectorization, and large-scale high-quality geometric dataset construction.
Research Purposes: Serves as the state-of-the-art baseline model for geometric perception, multimodal code generation, and spatial reasoning research, supporting cutting-edge academic exploration in related fields.
Industrial Applications: Can be integrated into professional publishing systems, CAD auxiliary tools, and intelligent education platforms to provide enterprise-level geometric figure processing capabilities.

Out-of-Scope Use Cases

Code generation for non-geometric images (e.g., natural images, portraits, complex 3D modeling, unstructured hand-drawn doodles).
Generation of 3D mechanical drawings, BIM models, and other complex engineering content beyond the scope of TikZ planar drawing.
High-risk industrial production drawings without professional compilation and manual verification.

Model Limitations

The model has higher requirements for deployment hardware, requiring a GPU with sufficient video memory for efficient inference (quantization is supported for deployment on consumer-grade GPUs).
While achieving extremely high compilation accuracy, the generated TikZ code for extremely complex engineering schematics still requires professional manual verification and fine-tuning.
The model only supports English and mathematical symbol annotation generation, and does not support text annotations in other languages for now.
Generation quality may degrade for severely blurred, distorted, or heavily occluded input images.

Quick Start

Environment Setup

Install basic dependencies:

pip install transformers torch pillow accelerate bitsandbytes

For full training/inference dependencies, please refer to the official project repository: GeoTikzBridge GitHub

Inference

from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image

# Load model and processor
model_name = "SJY-1995/GeoTikzBridge-Base-38B"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

# Load input geometric image
image = Image.open("your_complex_geometric_figure.png").convert("RGB")

# Build inference prompt
prompt = ""
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)

# Generate TikZ code
with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=8192,
        temperature=0.1,
        top_p=0.95,
        do_sample=False
    )

# Decode and output the generated result
tikz_code = processor.decode(output[0], skip_special_tokens=True)
print("Generated TikZ Code:\n", tikz_code)

Training Details

Training Dataset

The model is fully fine-tuned on the GeoTikz-Base dataset, which contains approximately 2.5 million high-quality geometric image-TikZ code paired samples. The dataset covers a wide range of scenarios, from basic planar geometric figures to complex composite illustrations, engineering schematics, and academic paper figures, building a comprehensive and high-precision vision-code aligned training corpus.

Dataset Link: SJY-1995/GeoTikz-Base

Key Training Hyperparameters

Hyperparameter	Configuration
Global Batch Size	64
Peak Learning Rate	2e-7
Training Epochs	3
Max Sequence Length	12800
Training Precision	BF16

Training Framework & Scripts

Model training is implemented based on the official InternVL training framework. The core fine-tuning scripts and complete training pipeline can be found in the project repository:

Project Repository: GeoTikzBridge GitHub

Model Family

The GeoTikzBridge series includes multiple models with different specifications and capabilities to adapt to various scenario requirements:

Model Name	Parameter Size	Core Capability	Model Link
GeoTikzBridge-Base-8B	8B	Lightweight, efficient basic geometric image-to-TikZ code generation	🤗 Hugging Face
GeoTikzBridge-Base-38B	38B	Flagship high-precision complex geometric figure TikZ code generation	🤗 Hugging Face
GeoTikzBridge-Instruct-8B	8B	Instruction following, auxiliary line generation, interactive geometric reasoning	🤗 Hugging Face

Citation

If you use this model, related datasets or code in your research or projects, please cite the following paper:

@inproceedings{
  geotikzbridge,
  title={GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning},
  author={Jiayin Sun and Caixia Sun and Boyu Yang and Hailin Li and Xiao Chen and Yi Zhang and Errui Ding and Liang Li and Chao Deng and Junlan Feng},
  booktitle={2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train SJY-1995/GeoTikzBridge-Base-38B

Evaluation results

CLIP-S on GeoTikz-Base
self-reported

91.500
LaTeX Compilation Success Rate on GeoTikz-Base
self-reported

95.200