add quantization_config.ignore=['lm_head'] (downstream audit fix) 91382f5 verified mattbucci commited on 9 days ago
Devstral 24B AWQ: GPTQ-calibrated, BOS-fixed chat template, 37 tok/s on RDNA4 df87209 verified mattbucci commited on 23 days ago