fix chat template to avoid empty historical `<think>` blocks
1
#11 opened 14 days ago
by
latent-variable
Why 35B-INT4 smaller than 27B-INT4
3
#10 opened about 1 month ago
by
andynoodles
Qwen3.5-35B-A3B-Base model quants
#9 opened about 1 month ago
by
Maksim1000
Smaller model quants
#8 opened about 2 months ago
by
swtb
Vllm did not recognise the model
#7 opened about 2 months ago
by
anura2026
Sglang config request
#6 opened about 2 months ago
by
cse2011
vllm (SM70) V100 support
2
#5 opened about 2 months ago
by
FayeQuant
What impact has quantization had on model performance / ability?
1
#4 opened about 2 months ago
by
spanspek
Working vLLM setup on RTX 5090 β 194-197 tok/s with image/video
ππ 2
5
#3 opened about 2 months ago
by
8055izham
Why are GPTQ scales stored as float16 while other weights are bfloat16?
π 1
#2 opened about 2 months ago
by
mylfm
Speculative Config - MTP Crash related to quantized expert names
β 8
2
#1 opened about 2 months ago
by
seanthomaswilliams