Autoresearch Manim Qwen2.5-Coder-3B MLX LoRA

Model Description

This repository contains an MLX LoRA adapter for Qwen/Qwen2.5-Coder-3B-Instruct, tuned for Manim Community Edition code generation.

The adapter is trained to produce runnable educational animation scenes, with a bias toward:

clean single-file Manim outputs
readable instructional composition
stronger renderability on curated scientific and technical explainer prompts

This is an adapter-only release, not a merged full checkpoint.

Release Details

Base model: Qwen/Qwen2.5-Coder-3B-Instruct
Fine-tuning method: MLX LoRA
Trainable layers: 12
LoRA rank: 8
Max sequence length: 3072
Seed: 42
Promoted track: render-first
Best restored checkpoint: step 60
Best validation loss: 0.529

Training Data

The adapter was trained on the curated public dataset:

Dataset: sebastianboehler/autoresearch-manim
Canonical cases: 172
Gold cases: 144
Silver cases: 28
Train / validation / test split for this run: 129 / 21 / 22

The corpus focuses on high-quality Manim code samples that were either hand-authored, imported from documentation and repositories, or promoted through manual review after successful render checks.

Held-Out Evaluation

Held-out evaluation for this promoted release completed on 2026-04-06 using the repo's repair-aware generation-plus-render harness on the 22-case gold-silver test split.

Syntax success rate: 0.818
Render success rate: 0.722
Mean case score: 0.709
Test loss: 0.554
Test perplexity: 1.740

Interpretation:

This is the current best render-first release in the project and it is materially stronger than the previous public checkpoint on actual render success.
The released adapter is still not near fully unattended reliability; remaining failures are concentrated in malformed scene logic, placeholder assets, and a small number of stubborn Manim API mismatches.
A higher-syntax alternative exists internally, but this release was promoted because render success is the primary deployment metric.

Important:

These held-out metrics were measured with the source repo's repair-aware Manim generation harness, not raw mlx_lm.generate alone.
If you want the exact promoted behavior, use this adapter together with the source repo in SebastianBoehler/autoresearch_manim_finetune, especially configs/m4_max_qwen25coder_3b_release_render.json.

Usage

Clone or download this adapter repo locally, then run it with mlx-lm against the base model:

git clone https://huggingface.co/sebastianboehler/autoresearch-manim-qwen25coder-3b-mlx-lora

python -m mlx_lm.generate \
  --model Qwen/Qwen2.5-Coder-3B-Instruct \
  --adapter-path ./autoresearch-manim-qwen25coder-3b-mlx-lora \
  --system-prompt "You write runnable Manim Community Edition Python files. Return only Python code, use from manim import *, and define exactly one scene class." \
  --prompt "Create a 10-second Manim scene that explains gradient descent on a contour plot." \
  --max-tokens 1600 \
  --temp 0.0 \
  --top-p 0.9 \
  --seed 42

Files

adapters.safetensors: release adapter weights
adapter_config.json: MLX LoRA training configuration
best_checkpoint.json: restored best-step metadata
release_metadata.json: release-level dataset and training summary

Notes

This adapter inherits the upstream base-model licensing constraints from Qwen.
The training dataset contains row-level provenance and licensing metadata; see the dataset card for details.
The public card now includes the current held-out evaluation summary for the released checkpoint.

Source

Training pipeline: SebastianBoehler/autoresearch_manim_finetune
Dataset: sebastianboehler/autoresearch-manim

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for sebastianboehler/autoresearch-manim-qwen25coder-3b-mlx-lora

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-Coder-3B

Finetuned

Qwen/Qwen2.5-Coder-3B-Instruct

Adapter

(25)

this model

sebastianboehler
/

autoresearch-manim-qwen25coder-3b-mlx-lora