GeoTikzBridge-Base-38B
Model Overview
GeoTikzBridge-Base-38B is the high-capacity flagship model of the GeoTikzBridge series, presented in the CVPR 2026 accepted paper GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning. Built on the state-of-the-art InternVL3.5-38B-Instruct multimodal architecture, this model is fully fine-tuned on the 2.5M-scale GeoTikz-Base dataset, delivering industry-leading performance in complex geometric figure perception and high-precision TikZ code generation. With 38 billion parameters, it achieves exceptional accuracy in fine-grained geometric structure restoration, complex multi-component figure parsing, and long-sequence standardized code generation, setting a new baseline for geometric multimodal code generation tasks.
Model Details
Core Architecture
- Backbone Foundation: Built on the InternVL3.5-38B-Instruct large multimodal model, leveraging its advanced vision-language alignment capability, long-context understanding, and robust code generation foundation.
- Parameter Scale: 38 billion parameters, optimized for high-precision geometric perception and complex code generation, with significantly enhanced representation learning ability for geometric spatial relationships and structural logic compared to smaller variants.
- Targeted Optimization: Specialized pre-training and fine-tuning for ultra-complex planar geometric figures, including multi-layer nested structures, dense annotation systems, multi-formula integration, and composite engineering schematics, with optimized long-sequence code generation stability.
Core Capabilities
- Ultra-High Precision Geometric Image-to-TikZ Conversion: Delivers pixel-level accurate restoration of complex geometric figures, with generated TikZ code achieving near-perfect consistency with the input image after LaTeX rendering.
- Complex Composite Figure Parsing: Supports end-to-end parsing and code generation for multi-component nested geometric figures, dense engineering schematics, and academic paper illustrations with complex layouts and annotations.
- Long-Sequence Stable Code Generation: Maintains excellent syntactic correctness and structural standardization for long TikZ code sequences, with a near-zero syntax error rate for complex figure generation.
- Fine-Grained Geometric Detail Restoration: Accurately captures and reproduces tiny geometric details, including precise angle labels, dimensional tolerances, dashed/dotted line styles, and nested mathematical formula annotations.
- Strong Cross-Scene Generalization: Maintains outstanding generation performance across diverse scenarios, from basic geometric teaching materials to high-standard academic journal illustrations and professional planar engineering schematics.
Intended Use & Limitations
Intended Use Cases
- Core Scenarios: High-quality geometric illustration generation for top-tier academic journals and conference papers, professional textbook and monograph geometric material production, high-precision planar engineering schematic vectorization, and large-scale high-quality geometric dataset construction.
- Research Purposes: Serves as the state-of-the-art baseline model for geometric perception, multimodal code generation, and spatial reasoning research, supporting cutting-edge academic exploration in related fields.
- Industrial Applications: Can be integrated into professional publishing systems, CAD auxiliary tools, and intelligent education platforms to provide enterprise-level geometric figure processing capabilities.
Out-of-Scope Use Cases
- Code generation for non-geometric images (e.g., natural images, portraits, complex 3D modeling, unstructured hand-drawn doodles).
- Generation of 3D mechanical drawings, BIM models, and other complex engineering content beyond the scope of TikZ planar drawing.
- High-risk industrial production drawings without professional compilation and manual verification.
Model Limitations
- The model has higher requirements for deployment hardware, requiring a GPU with sufficient video memory for efficient inference (quantization is supported for deployment on consumer-grade GPUs).
- While achieving extremely high compilation accuracy, the generated TikZ code for extremely complex engineering schematics still requires professional manual verification and fine-tuning.
- The model only supports English and mathematical symbol annotation generation, and does not support text annotations in other languages for now.
- Generation quality may degrade for severely blurred, distorted, or heavily occluded input images.
Quick Start
Environment Setup
Install basic dependencies:
pip install transformers torch pillow accelerate bitsandbytes
For full training/inference dependencies, please refer to the official project repository: GeoTikzBridge GitHub
Inference
from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image
# Load model and processor
model_name = "SJY-1995/GeoTikzBridge-Base-38B"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
# Load input geometric image
image = Image.open("your_complex_geometric_figure.png").convert("RGB")
# Build inference prompt
prompt = ""
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
# Generate TikZ code
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=8192,
temperature=0.1,
top_p=0.95,
do_sample=False
)
# Decode and output the generated result
tikz_code = processor.decode(output[0], skip_special_tokens=True)
print("Generated TikZ Code:\n", tikz_code)
Training Details
Training Dataset
The model is fully fine-tuned on the GeoTikz-Base dataset, which contains approximately 2.5 million high-quality geometric image-TikZ code paired samples. The dataset covers a wide range of scenarios, from basic planar geometric figures to complex composite illustrations, engineering schematics, and academic paper figures, building a comprehensive and high-precision vision-code aligned training corpus.
Dataset Link: SJY-1995/GeoTikz-Base
Key Training Hyperparameters
| Hyperparameter | Configuration |
|---|---|
| Global Batch Size | 64 |
| Peak Learning Rate | 2e-7 |
| Training Epochs | 3 |
| Max Sequence Length | 12800 |
| Training Precision | BF16 |
Training Framework & Scripts
Model training is implemented based on the official InternVL training framework. The core fine-tuning scripts and complete training pipeline can be found in the project repository:
- Project Repository: GeoTikzBridge GitHub
Model Family
The GeoTikzBridge series includes multiple models with different specifications and capabilities to adapt to various scenario requirements:
| Model Name | Parameter Size | Core Capability | Model Link |
|---|---|---|---|
| GeoTikzBridge-Base-8B | 8B | Lightweight, efficient basic geometric image-to-TikZ code generation | 🤗 Hugging Face |
| GeoTikzBridge-Base-38B | 38B | Flagship high-precision complex geometric figure TikZ code generation | 🤗 Hugging Face |
| GeoTikzBridge-Instruct-8B | 8B | Instruction following, auxiliary line generation, interactive geometric reasoning | 🤗 Hugging Face |
Citation
If you use this model, related datasets or code in your research or projects, please cite the following paper:
@inproceedings{
geotikzbridge,
title={GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning},
author={Jiayin Sun and Caixia Sun and Boyu Yang and Hailin Li and Xiao Chen and Yi Zhang and Errui Ding and Liang Li and Chao Deng and Junlan Feng},
booktitle={2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}
Dataset used to train SJY-1995/GeoTikzBridge-Base-38B
Evaluation results
- CLIP-S on GeoTikz-Baseself-reported91.500
- LaTeX Compilation Success Rate on GeoTikz-Baseself-reported95.200