GeoTikzBridge-Base-38B

Model Overview

GeoTikzBridge-Base-38B is the high-capacity flagship model of the GeoTikzBridge series, presented in the CVPR 2026 accepted paper GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning. Built on the state-of-the-art InternVL3.5-38B-Instruct multimodal architecture, this model is fully fine-tuned on the 2.5M-scale GeoTikz-Base dataset, delivering industry-leading performance in complex geometric figure perception and high-precision TikZ code generation. With 38 billion parameters, it achieves exceptional accuracy in fine-grained geometric structure restoration, complex multi-component figure parsing, and long-sequence standardized code generation, setting a new baseline for geometric multimodal code generation tasks.

Model Details

Core Architecture

  • Backbone Foundation: Built on the InternVL3.5-38B-Instruct large multimodal model, leveraging its advanced vision-language alignment capability, long-context understanding, and robust code generation foundation.
  • Parameter Scale: 38 billion parameters, optimized for high-precision geometric perception and complex code generation, with significantly enhanced representation learning ability for geometric spatial relationships and structural logic compared to smaller variants.
  • Targeted Optimization: Specialized pre-training and fine-tuning for ultra-complex planar geometric figures, including multi-layer nested structures, dense annotation systems, multi-formula integration, and composite engineering schematics, with optimized long-sequence code generation stability.

Core Capabilities

  1. Ultra-High Precision Geometric Image-to-TikZ Conversion: Delivers pixel-level accurate restoration of complex geometric figures, with generated TikZ code achieving near-perfect consistency with the input image after LaTeX rendering.
  2. Complex Composite Figure Parsing: Supports end-to-end parsing and code generation for multi-component nested geometric figures, dense engineering schematics, and academic paper illustrations with complex layouts and annotations.
  3. Long-Sequence Stable Code Generation: Maintains excellent syntactic correctness and structural standardization for long TikZ code sequences, with a near-zero syntax error rate for complex figure generation.
  4. Fine-Grained Geometric Detail Restoration: Accurately captures and reproduces tiny geometric details, including precise angle labels, dimensional tolerances, dashed/dotted line styles, and nested mathematical formula annotations.
  5. Strong Cross-Scene Generalization: Maintains outstanding generation performance across diverse scenarios, from basic geometric teaching materials to high-standard academic journal illustrations and professional planar engineering schematics.

Intended Use & Limitations

Intended Use Cases

  • Core Scenarios: High-quality geometric illustration generation for top-tier academic journals and conference papers, professional textbook and monograph geometric material production, high-precision planar engineering schematic vectorization, and large-scale high-quality geometric dataset construction.
  • Research Purposes: Serves as the state-of-the-art baseline model for geometric perception, multimodal code generation, and spatial reasoning research, supporting cutting-edge academic exploration in related fields.
  • Industrial Applications: Can be integrated into professional publishing systems, CAD auxiliary tools, and intelligent education platforms to provide enterprise-level geometric figure processing capabilities.

Out-of-Scope Use Cases

  • Code generation for non-geometric images (e.g., natural images, portraits, complex 3D modeling, unstructured hand-drawn doodles).
  • Generation of 3D mechanical drawings, BIM models, and other complex engineering content beyond the scope of TikZ planar drawing.
  • High-risk industrial production drawings without professional compilation and manual verification.

Model Limitations

  • The model has higher requirements for deployment hardware, requiring a GPU with sufficient video memory for efficient inference (quantization is supported for deployment on consumer-grade GPUs).
  • While achieving extremely high compilation accuracy, the generated TikZ code for extremely complex engineering schematics still requires professional manual verification and fine-tuning.
  • The model only supports English and mathematical symbol annotation generation, and does not support text annotations in other languages for now.
  • Generation quality may degrade for severely blurred, distorted, or heavily occluded input images.

Quick Start

Environment Setup

Install basic dependencies:

pip install transformers torch pillow accelerate bitsandbytes

For full training/inference dependencies, please refer to the official project repository: GeoTikzBridge GitHub

Inference

from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image

# Load model and processor
model_name = "SJY-1995/GeoTikzBridge-Base-38B"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

# Load input geometric image
image = Image.open("your_complex_geometric_figure.png").convert("RGB")

# Build inference prompt
prompt = ""
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)

# Generate TikZ code
with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=8192,
        temperature=0.1,
        top_p=0.95,
        do_sample=False
    )

# Decode and output the generated result
tikz_code = processor.decode(output[0], skip_special_tokens=True)
print("Generated TikZ Code:\n", tikz_code)

Training Details

Training Dataset

The model is fully fine-tuned on the GeoTikz-Base dataset, which contains approximately 2.5 million high-quality geometric image-TikZ code paired samples. The dataset covers a wide range of scenarios, from basic planar geometric figures to complex composite illustrations, engineering schematics, and academic paper figures, building a comprehensive and high-precision vision-code aligned training corpus.

Dataset Link: SJY-1995/GeoTikz-Base

Key Training Hyperparameters

Hyperparameter Configuration
Global Batch Size 64
Peak Learning Rate 2e-7
Training Epochs 3
Max Sequence Length 12800
Training Precision BF16

Training Framework & Scripts

Model training is implemented based on the official InternVL training framework. The core fine-tuning scripts and complete training pipeline can be found in the project repository:

Model Family

The GeoTikzBridge series includes multiple models with different specifications and capabilities to adapt to various scenario requirements:

Model Name Parameter Size Core Capability Model Link
GeoTikzBridge-Base-8B 8B Lightweight, efficient basic geometric image-to-TikZ code generation 🤗 Hugging Face
GeoTikzBridge-Base-38B 38B Flagship high-precision complex geometric figure TikZ code generation 🤗 Hugging Face
GeoTikzBridge-Instruct-8B 8B Instruction following, auxiliary line generation, interactive geometric reasoning 🤗 Hugging Face

Citation

If you use this model, related datasets or code in your research or projects, please cite the following paper:

@inproceedings{
  geotikzbridge,
  title={GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning},
  author={Jiayin Sun and Caixia Sun and Boyu Yang and Hailin Li and Xiao Chen and Yi Zhang and Errui Ding and Liang Li and Chao Deng and Junlan Feng},
  booktitle={2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train SJY-1995/GeoTikzBridge-Base-38B

Evaluation results