GeoTikzBridge-Instruct-8B
Model Overview
GeoTikzBridge-Instruct-8B is the instruction-tuned variant of the GeoTikzBridge series, proposed in the CVPR 2026 accepted paper GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning. Built on GeoTikzBridge-Base-8B, this model is further fine-tuned on the 419k-scale GeoTikz-Instruct dataset, enabling strong instruction-following capabilities for geometric tasks. Beyond basic image-to-TikZ conversion, it supports instruction-guided auxiliary line generation, interactive geometric modification, and step-by-step geometric reasoning, making it a powerful tool for educational and research scenarios requiring interactive geometric manipulation.
Model Details
Core Architecture
- Backbone Foundation: Initialized from GeoTikzBridge-Base-8B (InternVL2 8B with InternLM2 7B backbone), inheriting its strong geometric perception and code generation capabilities.
- Parameter Scale: 8 billion parameters, maintaining efficient inference while supporting complex instruction understanding.
- Instruction Tuning: Fine-tuned on a diverse set of geometric instruction-response pairs, enabling the model to understand and execute natural language instructions for geometric figure manipulation.
Core Capabilities
- Instruction-Guided TikZ Generation: Generates TikZ code based on natural language instructions (e.g., "Draw a right triangle with a height of 5cm and label the right angle").
- Auxiliary Line Generation: Adds auxiliary lines (e.g., perpendicular bisectors, angle bisectors, medians) to existing geometric figures as instructed, supporting geometric problem-solving.
- Interactive Geometric Modification: Modifies existing geometric figures (e.g., resizing, rotating, adding/removing elements) according to user instructions.
- Basic Geometric Reasoning: Provides step-by-step geometric reasoning processes (in text form) alongside TikZ code generation for simple geometric problems.
Intended Use & Limitations
Intended Use Cases
- Core Scenarios: Interactive geometric teaching aids, step-by-step geometry problem-solving assistance, dynamic geometric illustration generation for educational materials, and research on geometric reasoning with multimodal models.
- Research Purposes: Serves as a baseline for instruction-tuned multimodal code generation and geometric reasoning research.
- Downstream Expansion: Can be integrated into educational platforms or geometric drawing tools to provide intelligent, interactive support.
Out-of-Scope Use Cases
- Non-geometric image manipulation or code generation.
- High-precision engineering drawing generation requiring professional CAD software.
- Solving advanced mathematical proofs or complex geometric problems beyond the scope of plane geometry.
Model Limitations
- The model primarily understands and executes instructions in English; instructions in other languages may lead to suboptimal results.
- While it can generate reasoning text for simple problems, it is not a substitute for professional mathematical proof systems.
- Complex or ambiguous instructions may require multiple rounds of clarification to achieve the desired result.
Quick Start
Environment Setup
Install basic dependencies:
pip install transformers torch pillow accelerate
For full training/inference dependencies, please refer to the official project repository: GeoTikzBridge GitHub
Inference Example
Quickly load the model for instruction-guided geometric code generation:
from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image
# Load model and processor
model_name = "SJY-1995/GeoTikzBridge-Instruct-8B"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
# Load input geometric image (optional, depending on the instruction)
# If the instruction requires modifying an existing figure, load the image here
# image = Image.open("existing_geometric_figure.png").convert("RGB")
# If generating a new figure from scratch, you can use a placeholder or omit the image (depending on model input requirements)
# Build an instruction-guided prompt
prompt = ""
# inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
# Generate TikZ code
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=4096,
temperature=0.2,
top_p=0.95,
do_sample=False
)
# Decode and output the generated result
tikz_code = processor.decode(output[0], skip_special_tokens=True)
print("Generated TikZ Code:\n", tikz_code)
Training Details
Training Dataset
The model is initialized from GeoTikzBridge-Base-8B and further fine-tuned on the GeoTikz-Instruct dataset, which contains approximately 419k high-quality instruction-geometric response pairs. The dataset covers diverse instruction types, including figure generation, auxiliary line addition, figure modification, and basic reasoning.
Dataset Links:
- GeoTikz-Base: SJY-1995/GeoTikz-Base
- GeoTikz-Instruct: SJY-1995/GeoTikz-Instruct
Key Training Hyperparameters
(Refer to the paper or official repository for detailed Instruct-version hyperparameters; the following are illustrative.)
| Hyperparameter | Configuration |
|---|---|
| Global Batch Size | 64 |
| Peak Learning Rate | 2e-7 |
| Training Epochs | 2 |
| Max Sequence Length | 12800 |
| Training Precision | BF16 |
Training Framework & Scripts
Refer to the official project repository for training details: GeoTikzBridge GitHub
Model Family
| Model Name | Parameter Size | Core Capability | Model Link |
|---|---|---|---|
| GeoTikzBridge-Base-8B | 8B | Basic geometric image-to-TikZ code generation | 🤗 Hugging Face |
| GeoTikzBridge-Base-38B | 38B | High-precision complex geometric figure TikZ code generation | 🤗 Hugging Face |
| GeoTikzBridge-Instruct-8B | 8B | Instruction following, auxiliary line generation, interactive geometric reasoning | 🤗 Hugging Face |
Citation
If you use this model, related datasets or code in your research or projects, please cite the following paper:
@inproceedings{
geotikzbridge,
title={GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning},
author={Jiayin Sun and Caixia Sun and Boyu Yang and Hailin Li and Xiao Chen and Yi Zhang and Errui Ding and Liang Li and Chao Deng and Junlan Feng},
booktitle={2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}
- Downloads last month
- 67
Datasets used to train SJY-1995/GeoTikzBridge-Instruct-8B
Evaluation results
- CLIP-S on GeoTikz-Instructself-reported99.200