✂️ Interlace-Qwen3-VL-4B-25pc

arXiv Project Page GitHub Collection CVPR 2026

This model was produced by INTERLACE, a layer-pruning framework for Vision-Language Models. 25% of the transformer layers in Qwen/Qwen3-VL-4B-Instruct were removed using triplet-based similarity analysis, and the remaining model was fine-tuned on 1% of FineVision for a single epoch.

81.7% average relative performance retained  |  25% layers dropped (9 of 36)  |  27 layers remaining

📋 Model Details

Property Value
Base Model Qwen/Qwen3-VL-4B-Instruct
Pruning Method INTERLACE (triplet-based interleaved pruning)
Pruning Ratio 25% (9 of 36 layers removed)
Remaining Layers 27
Hidden Size 2560
Fine-tuning Data 1% of FineVision (~240K samples)
Fine-tuning Epochs 1

🚀 Usage

from transformers import AutoModelForImageTextToText, AutoProcessor

model = AutoModelForImageTextToText.from_pretrained(
    "pmadinei/Interlace-Qwen3-VL-4B-25pc",
    dtype="auto",
    device_map="auto",
    attn_implementation="flash_attention_2",
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-4B-Instruct")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "path/to/image.jpg"},
            {"type": "text", "text": "Describe this image in detail."},
        ],
    }
]

inputs = processor.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True,
    return_dict=True, return_tensors="pt",
).to(model.device)

output = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(output[0], skip_special_tokens=True))

📊 Performance

Relative performance compared to the unpruned baseline (% of baseline score, Chain-of-Thought enabled):

Category Benchmark Relative Perf.
Text/Chart AI2D 79.7%
Text/Chart ChartQA 84.9%
Text/Chart OCRBench 80.0%
Text/Chart TextVQA 89.1%
General VQA MMBench 77.0%
General VQA POPE 98.2%
General VQA RealWorldQA 82.2%
Perception HRBench4K 80.1%
Perception HRBench8K 81.8%
Perception V-Star 76.5%
Inst & Sci MIABench 77.9%
Inst & Sci ScienceQA 72.6%
Overall Average 81.7%

🤗 All INTERLACE Models

📝 Citation

@inproceedings{madinei2026interlace,
  title={INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models},
  author={Madinei, Parsa and Solgi, Ryan and Wen, Ziqi and Skaza, Jonathan and Eckstein, Miguel and Pedarsani, Ramtin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}
Downloads last month
30
Safetensors
Model size
521k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pmadinei/Interlace-Qwen3-VL-4B-25pc

Finetuned
(239)
this model

Dataset used to train pmadinei/Interlace-Qwen3-VL-4B-25pc

Collection including pmadinei/Interlace-Qwen3-VL-4B-25pc

Paper for pmadinei/Interlace-Qwen3-VL-4B-25pc

Evaluation results