GeoTikzBridge-Instruct-8B

Model Overview

GeoTikzBridge-Instruct-8B is the instruction-tuned variant of the GeoTikzBridge series, proposed in the CVPR 2026 accepted paper GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning. Built on GeoTikzBridge-Base-8B, this model is further fine-tuned on the 419k-scale GeoTikz-Instruct dataset, enabling strong instruction-following capabilities for geometric tasks. Beyond basic image-to-TikZ conversion, it supports instruction-guided auxiliary line generation, interactive geometric modification, and step-by-step geometric reasoning, making it a powerful tool for educational and research scenarios requiring interactive geometric manipulation.

Model Details

Core Architecture

  • Backbone Foundation: Initialized from GeoTikzBridge-Base-8B (InternVL2 8B with InternLM2 7B backbone), inheriting its strong geometric perception and code generation capabilities.
  • Parameter Scale: 8 billion parameters, maintaining efficient inference while supporting complex instruction understanding.
  • Instruction Tuning: Fine-tuned on a diverse set of geometric instruction-response pairs, enabling the model to understand and execute natural language instructions for geometric figure manipulation.

Core Capabilities

  1. Instruction-Guided TikZ Generation: Generates TikZ code based on natural language instructions (e.g., "Draw a right triangle with a height of 5cm and label the right angle").
  2. Auxiliary Line Generation: Adds auxiliary lines (e.g., perpendicular bisectors, angle bisectors, medians) to existing geometric figures as instructed, supporting geometric problem-solving.
  3. Interactive Geometric Modification: Modifies existing geometric figures (e.g., resizing, rotating, adding/removing elements) according to user instructions.
  4. Basic Geometric Reasoning: Provides step-by-step geometric reasoning processes (in text form) alongside TikZ code generation for simple geometric problems.

Intended Use & Limitations

Intended Use Cases

  • Core Scenarios: Interactive geometric teaching aids, step-by-step geometry problem-solving assistance, dynamic geometric illustration generation for educational materials, and research on geometric reasoning with multimodal models.
  • Research Purposes: Serves as a baseline for instruction-tuned multimodal code generation and geometric reasoning research.
  • Downstream Expansion: Can be integrated into educational platforms or geometric drawing tools to provide intelligent, interactive support.

Out-of-Scope Use Cases

  • Non-geometric image manipulation or code generation.
  • High-precision engineering drawing generation requiring professional CAD software.
  • Solving advanced mathematical proofs or complex geometric problems beyond the scope of plane geometry.

Model Limitations

  • The model primarily understands and executes instructions in English; instructions in other languages may lead to suboptimal results.
  • While it can generate reasoning text for simple problems, it is not a substitute for professional mathematical proof systems.
  • Complex or ambiguous instructions may require multiple rounds of clarification to achieve the desired result.

Quick Start

Environment Setup

Install basic dependencies:

pip install transformers torch pillow accelerate

For full training/inference dependencies, please refer to the official project repository: GeoTikzBridge GitHub

Inference Example

Quickly load the model for instruction-guided geometric code generation:

from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image

# Load model and processor
model_name = "SJY-1995/GeoTikzBridge-Instruct-8B"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

# Load input geometric image (optional, depending on the instruction)
# If the instruction requires modifying an existing figure, load the image here
# image = Image.open("existing_geometric_figure.png").convert("RGB")
# If generating a new figure from scratch, you can use a placeholder or omit the image (depending on model input requirements)

# Build an instruction-guided prompt
prompt = ""
# inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)

# Generate TikZ code
with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=4096,
        temperature=0.2,
        top_p=0.95,
        do_sample=False
    )

# Decode and output the generated result
tikz_code = processor.decode(output[0], skip_special_tokens=True)
print("Generated TikZ Code:\n", tikz_code)

Training Details

Training Dataset

The model is initialized from GeoTikzBridge-Base-8B and further fine-tuned on the GeoTikz-Instruct dataset, which contains approximately 419k high-quality instruction-geometric response pairs. The dataset covers diverse instruction types, including figure generation, auxiliary line addition, figure modification, and basic reasoning.

Dataset Links:

Key Training Hyperparameters

(Refer to the paper or official repository for detailed Instruct-version hyperparameters; the following are illustrative.)

Hyperparameter Configuration
Global Batch Size 64
Peak Learning Rate 2e-7
Training Epochs 2
Max Sequence Length 12800
Training Precision BF16

Training Framework & Scripts

Refer to the official project repository for training details: GeoTikzBridge GitHub

Model Family

Model Name Parameter Size Core Capability Model Link
GeoTikzBridge-Base-8B 8B Basic geometric image-to-TikZ code generation 🤗 Hugging Face
GeoTikzBridge-Base-38B 38B High-precision complex geometric figure TikZ code generation 🤗 Hugging Face
GeoTikzBridge-Instruct-8B 8B Instruction following, auxiliary line generation, interactive geometric reasoning 🤗 Hugging Face

Citation

If you use this model, related datasets or code in your research or projects, please cite the following paper:

@inproceedings{
  geotikzbridge,
  title={GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning},
  author={Jiayin Sun and Caixia Sun and Boyu Yang and Hailin Li and Xiao Chen and Yi Zhang and Errui Ding and Liang Li and Chao Deng and Junlan Feng},
  booktitle={2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}
Downloads last month
67
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train SJY-1995/GeoTikzBridge-Instruct-8B

Evaluation results