Interlace
Collection
INTERLACE: Interleaved Layer Pruning in VLMs (CVPR 2025). Pruned Qwen3-VL models retaining up to 94% performance. • 8 items • Updated
This model was produced by INTERLACE, a layer-pruning framework for Vision-Language Models. 15% of the transformer layers in Qwen/Qwen3-VL-8B-Instruct were removed using triplet-based similarity analysis, and the remaining model was fine-tuned on 1% of FineVision for a single epoch.
92.1% average relative performance retained | 15% layers dropped (5 of 36) | 31 layers remaining
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3-VL-8B-Instruct |
| Pruning Method | INTERLACE (triplet-based interleaved pruning) |
| Pruning Ratio | 15% (5 of 36 layers removed) |
| Remaining Layers | 31 |
| Hidden Size | 4096 |
| Fine-tuning Data | 1% of FineVision (~240K samples) |
| Fine-tuning Epochs | 1 |
from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained(
"pmadinei/Interlace-Qwen3-VL-8B-15pc",
dtype="auto",
device_map="auto",
attn_implementation="flash_attention_2",
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-8B-Instruct")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": "path/to/image.jpg"},
{"type": "text", "text": "Describe this image in detail."},
],
}
]
inputs = processor.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True,
return_dict=True, return_tensors="pt",
).to(model.device)
output = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(output[0], skip_special_tokens=True))
Relative performance compared to the unpruned baseline (% of baseline score, Chain-of-Thought enabled):
| Category | Benchmark | Relative Perf. |
|---|---|---|
| Text/Chart | AI2D | 92.5% |
| Text/Chart | ChartQA | 91.6% |
| Text/Chart | OCRBench | 89.6% |
| Text/Chart | TextVQA | 95.8% |
| General VQA | MMBench | 90.0% |
| General VQA | POPE | 99.6% |
| General VQA | RealWorldQA | 92.1% |
| Perception | HRBench4K | 92.1% |
| Perception | HRBench8K | 90.0% |
| Perception | V-Star | 89.7% |
| Inst & Sci | MIABench | 89.6% |
| Inst & Sci | ScienceQA | 92.9% |
| Overall Average | 92.1% |
| Model | Drop % | Rel. Perf. |
|---|---|---|
| Interlace-Qwen3-VL-8B-10pc | 10% | 94.0% |
| Interlace-Qwen3-VL-8B-15pc | 15% | 92.1% |
| Interlace-Qwen3-VL-8B-20pc | 20% | 86.9% |
| Interlace-Qwen3-VL-8B-25pc | 25% | 86.1% |
| Interlace-Qwen3-VL-4B-10pc | 10% | 93.9% |
| Interlace-Qwen3-VL-4B-15pc | 15% | 91.9% |
| Interlace-Qwen3-VL-4B-20pc | 20% | 88.0% |
| Interlace-Qwen3-VL-4B-25pc | 25% | 81.7% |
@inproceedings{madinei2026interlace,
title={INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models},
author={Madinei, Parsa and Solgi, Ryan and Wen, Ziqi and Skaza, Jonathan and Eckstein, Miguel and Pedarsani, Ramtin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}
Base model
Qwen/Qwen3-VL-8B-Instruct