Compressed Model: MilyaShams/Qwen3-1.7B-Pipe_AWQ_W4A16_Wanda

This model was compressed using the llmcompressor framework.

Compression Details

Base Model: Qwen/Qwen3-1.7B
Experiment Name: Pipe_AWQ_W4A16_Wanda
Recipe / Modifiers Applied:

[AWQModifier(config_groups=None, targets=['Linear'], ignore=[], scheme='W4A16', kv_cache_scheme=None, weight_observer=None, input_observer=None, output_observer=None, observer=None, bypass_divisibility_checks=False, index=None, group=None, start=None, end=None, update=None, initialized_=True, finalized_=True, started_=True, ended_=True, sequential_targets=None, mappings=[AWQMapping(smooth_layer='re:.*input_layernorm$', balance_layers=['re:.*q_proj$', 're:.*k_proj$', 're:.*v_proj$'], activation_hook_target=None), AWQMapping(smooth_layer='re:.*v_proj$', balance_layers=['re:.*o_proj$'], activation_hook_target=None), AWQMapping(smooth_layer='re:.*post_attention_layernorm$', balance_layers=['re:.*gate_proj$', 're:.*up_proj$'], activation_hook_target=None), AWQMapping(smooth_layer='re:.*up_proj$', balance_layers=['re:.*down_proj$'], activation_hook_target=None)], offload_device='cpu', duo_scaling=True, n_grid=20), WandaPruningModifier(index=None, group=None, start=None, end=None, update=None, initialized_=True, finalized_=True, started_=True, ended_=True, sparsity=0.6, sparsity_profile=None, mask_structure='0:0', owl_m=None, owl_lmbda=None, sequential_update=False, sequential_targets=['Qwen3DecoderLayer'], targets=['Linear'], ignore=[])]

Note: This model card was automatically generated. All structural modifiers and parameters used during compression are logged above.

Downloads last month: 118

Safetensors

Model size

2B params

Tensor type

I64

F32

I32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MilyaShams/Qwen3-1.7B-Pipe_AWQ_W4A16_Wanda

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Quantized

(254)

this model