35 10 35

wenhua cheng

wenhuach

wenhuach21

AI & ML interests

Model Compression, CV

Recent Activity

new activity 1 day ago

Intel/gemma-4-31B-it-int4-AutoRound:Installation Video and Testing - Step by Step

updated a model 1 day ago

Intel/gemma-4-31B-it-int4-AutoRound

new activity 1 day ago

Intel/gemma-4-26B-A4B-it-int4-mixed-AutoRound:GGUF version

View all activity

Organizations

New activity in Intel/gemma-4-31B-it-int4-AutoRound 1 day ago

Installation Video and Testing - Step by Step

🚀 2

#1 opened 2 days ago by

fahdmirzac

New activity in Intel/gemma-4-26B-A4B-it-int4-mixed-AutoRound 1 day ago

GGUF version

#1 opened 2 days ago by

limcheekin

New activity in Intel/Qwen3.5-397B-A17B-gguf-q2ks-mixed-AutoRound 19 days ago

Performance indicators

👍 2

#1 opened 19 days ago by

dehnhaide

New activity in Intel/GLM-5-int4-mixed-AutoRound 26 days ago

This model always predicts some few nonsense sequences

#1 opened about 1 month ago by

CharlesChen2023

New activity in Intel/Qwen3.5-122B-A10B-int4-AutoRound about 1 month ago

Does the A100 work?

#1 opened about 1 month ago by

xz123321

New activity in Intel/Qwen3.5-35B-A3B-int4-AutoRound about 1 month ago

Thanks! And MTP key question

#1 opened about 1 month ago by

seanthomaswilliams

New activity in Intel/GLM-5-int4-mixed-AutoRound about 1 month ago

vLLM fails to serve Intel/GLM-5-int4-mixed-AutoRound on NVIDIA DGX Spark (GB10, sm121) due to no valid MLA attention backend (qk_nope_head_dim 192)

#2 opened about 1 month ago by

oliverjohnwilson

New activity in Intel/GLM-4.7-Flash-int4-AutoRound about 2 months ago

Convert to gguf-q2ks-mixed-AutoRound?

🔥 2

#2 opened 3 months ago by

limcheekin

New activity in Intel/Qwen3-Next-80B-A3B-Thinking-int4-AutoRound 2 months ago

Qwen/Qwen3-Next-80B-A3B-Thinking has MMLU_PRO 82.7 but you guys get 0.7271

#2 opened 7 months ago by

hlxxxxxx

New activity in Intel/GLM-4.7-int4-mixed-AutoRound 3 months ago

AutoRound request: GLM-4.5-Air

#1 opened 3 months ago by

babytifa

New activity in Intel/Qwen3-30B-A3B-Instruct-2507-gguf-q2ks-mixed-AutoRound 3 months ago

2507 Thinking model release

#4 opened 6 months ago by

anjeysapkovski

New activity in kernels-community/quantization-gptq 3 months ago

How to use this kernel

#1 opened 3 months ago by

wenhuach

New activity in Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound 4 months ago

Thinking version has been deleted?

#2 opened 4 months ago by

reswewr

New activity in Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound 4 months ago

Improve model card: Add pipeline tag, library name, and update paper/citation

🔥 1

#1 opened 4 months ago by

nielsr

New activity in Intel/Magistral-Small-2509-int4-AutoRound 5 months ago

Could we get more w2a16 w3a16 and w4a16 Autoround

👍 1

#1 opened 5 months ago by

twhitworth

New activity in Intel/Ling-flash-2.0-gguf-q2ks-mixed-AutoRound 5 months ago

Practical performance feedback

#2 opened 5 months ago by

maigonis

New activity in Intel/Mistral-Small-3.2-24B-Instruct-2506-int4-AutoRound 5 months ago

Works good with vLLM, just no tool calling

#1 opened 8 months ago by

Ununnilium

New activity in Intel/Ling-flash-2.0-gguf-q2ks-mixed-AutoRound 6 months ago

Inference with llama.cpp + Open WebUI gives repeating `?`

#1 opened 6 months ago by

whoisjeremylam

New activity in Intel/Qwen3-Next-80B-A3B-Thinking-int4-mixed-AutoRound 7 months ago

Adding `transformers` as the library tag

#3 opened 7 months ago by

ariG23498

CPU only?

#2 opened 7 months ago by

jujutechnology

wenhua cheng

AI & ML interests

Recent Activity

Organizations

wenhuach's activity

Installation Video and Testing - Step by Step

GGUF version

Performance indicators

This model always predicts some few nonsense sequences

Does the A100 work?

Thanks! And MTP key question

vLLM fails to serve Intel/GLM-5-int4-mixed-AutoRound on NVIDIA DGX Spark (GB10, sm121) due to no valid MLA attention backend (qk_nope_head_dim 192)

Convert to gguf-q2ks-mixed-AutoRound?

Qwen/Qwen3-Next-80B-A3B-Thinking has MMLU_PRO 82.7 but you guys get 0.7271

AutoRound request: GLM-4.5-Air

2507 Thinking model release

How to use this kernel

Thinking version has been deleted?

Improve model card: Add pipeline tag, library name, and update paper/citation

Could we get more w2a16 w3a16 and w4a16 Autoround

Practical performance feedback

Works good with vLLM, just no tool calling

Inference with llama.cpp + Open WebUI gives repeating `?`

Adding `transformers` as the library tag

CPU only?