GeoTikzBridge-Instruct-8B

Model Overview

GeoTikzBridge-Instruct-8B is the instruction-tuned variant of the GeoTikzBridge series, proposed in the CVPR 2026 accepted paper GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning. Built on GeoTikzBridge-Base-8B, this model is further fine-tuned on the 419k-scale GeoTikz-Instruct dataset, enabling strong instruction-following capabilities for geometric tasks. Beyond basic image-to-TikZ conversion, it supports instruction-guided auxiliary line generation, interactive geometric modification, and step-by-step geometric reasoning, making it a powerful tool for educational and research scenarios requiring interactive geometric manipulation.

Model Details

Core Architecture

Backbone Foundation: Initialized from GeoTikzBridge-Base-8B (InternVL2 8B with InternLM2 7B backbone), inheriting its strong geometric perception and code generation capabilities.
Parameter Scale: 8 billion parameters, maintaining efficient inference while supporting complex instruction understanding.
Instruction Tuning: Fine-tuned on a diverse set of geometric instruction-response pairs, enabling the model to understand and execute natural language instructions for geometric figure manipulation.

Core Capabilities

Instruction-Guided TikZ Generation: Generates TikZ code based on natural language instructions (e.g., "Draw a right triangle with a height of 5cm and label the right angle").
Auxiliary Line Generation: Adds auxiliary lines (e.g., perpendicular bisectors, angle bisectors, medians) to existing geometric figures as instructed, supporting geometric problem-solving.
Interactive Geometric Modification: Modifies existing geometric figures (e.g., resizing, rotating, adding/removing elements) according to user instructions.
Basic Geometric Reasoning: Provides step-by-step geometric reasoning processes (in text form) alongside TikZ code generation for simple geometric problems.

Intended Use & Limitations

Intended Use Cases

Core Scenarios: Interactive geometric teaching aids, step-by-step geometry problem-solving assistance, dynamic geometric illustration generation for educational materials, and research on geometric reasoning with multimodal models.
Research Purposes: Serves as a baseline for instruction-tuned multimodal code generation and geometric reasoning research.
Downstream Expansion: Can be integrated into educational platforms or geometric drawing tools to provide intelligent, interactive support.

Out-of-Scope Use Cases

Non-geometric image manipulation or code generation.
High-precision engineering drawing generation requiring professional CAD software.
Solving advanced mathematical proofs or complex geometric problems beyond the scope of plane geometry.

Model Limitations

The model primarily understands and executes instructions in English; instructions in other languages may lead to suboptimal results.
While it can generate reasoning text for simple problems, it is not a substitute for professional mathematical proof systems.
Complex or ambiguous instructions may require multiple rounds of clarification to achieve the desired result.

Quick Start

Environment Setup

Install basic dependencies:

pip install transformers torch pillow accelerate

For full training/inference dependencies, please refer to the official project repository: GeoTikzBridge GitHub

Inference Example

Quickly load the model for instruction-guided geometric code generation:

from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image

# Load model and processor
model_name = "SJY-1995/GeoTikzBridge-Instruct-8B"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

# Load input geometric image (optional, depending on the instruction)
# If the instruction requires modifying an existing figure, load the image here
# image = Image.open("existing_geometric_figure.png").convert("RGB")
# If generating a new figure from scratch, you can use a placeholder or omit the image (depending on model input requirements)

# Build an instruction-guided prompt
prompt = ""
# inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)

# Generate TikZ code
with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=4096,
        temperature=0.2,
        top_p=0.95,
        do_sample=False
    )

# Decode and output the generated result
tikz_code = processor.decode(output[0], skip_special_tokens=True)
print("Generated TikZ Code:\n", tikz_code)

Training Details

Training Dataset

The model is initialized from GeoTikzBridge-Base-8B and further fine-tuned on the GeoTikz-Instruct dataset, which contains approximately 419k high-quality instruction-geometric response pairs. The dataset covers diverse instruction types, including figure generation, auxiliary line addition, figure modification, and basic reasoning.

Dataset Links:

GeoTikz-Base: SJY-1995/GeoTikz-Base
GeoTikz-Instruct: SJY-1995/GeoTikz-Instruct

Key Training Hyperparameters

(Refer to the paper or official repository for detailed Instruct-version hyperparameters; the following are illustrative.)

Hyperparameter	Configuration
Global Batch Size	64
Peak Learning Rate	2e-7
Training Epochs	2
Max Sequence Length	12800
Training Precision	BF16

Training Framework & Scripts

Refer to the official project repository for training details: GeoTikzBridge GitHub

Model Family

Model Name	Parameter Size	Core Capability	Model Link
GeoTikzBridge-Base-8B	8B	Basic geometric image-to-TikZ code generation	🤗 Hugging Face
GeoTikzBridge-Base-38B	38B	High-precision complex geometric figure TikZ code generation	🤗 Hugging Face
GeoTikzBridge-Instruct-8B	8B	Instruction following, auxiliary line generation, interactive geometric reasoning	🤗 Hugging Face

Citation

If you use this model, related datasets or code in your research or projects, please cite the following paper:

@inproceedings{
  geotikzbridge,
  title={GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning},
  author={Jiayin Sun and Caixia Sun and Boyu Yang and Hailin Li and Xiao Chen and Yi Zhang and Errui Ding and Liang Li and Chao Deng and Junlan Feng},
  booktitle={2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}

Downloads last month: 67

Safetensors

Model size

8B params

Tensor type

BF16

Datasets used to train SJY-1995/GeoTikzBridge-Instruct-8B

Evaluation results

CLIP-S on GeoTikz-Instruct
self-reported

99.200