Qwen2.5-Coder-7B-Instruct-zse-int4

Pre-converted ZSE model for ultra-fast inference.

Source Model

Original: Qwen/Qwen2.5-Coder-7B-Instruct
Quantization: INT4
File Size: 5.18 GB
Format: ZSE binary (.zse)

Usage

pip install zllm-zse

# Download and serve
zse pull qwen2.5-coder-7b-instruct
zse serve qwen2.5-coder-7b-instruct

# Or direct
zse serve Qwen2.5-Coder-7B-Instruct-zse-int4.zse

Benefits

5x faster cold start compared to HuggingFace loading
10-14% less VRAM with ZSE custom INT4 kernels
Single file — tokenizer and config embedded
No internet required after download

Benchmarks

See ZSE Documentation for full benchmarks.

Converted with ZSE v1.4.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zse-zllm/Qwen2.5-Coder-7B-Instruct-zse-int4

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

Qwen/Qwen2.5-Coder-7B-Instruct

Finetuned

(331)

this model